Code Generation Scheme

discussion
generator-provider
#1

I think we need to think a better and standard way of generating code in code generator provider.

In our built-in code generator Polyrific.Catapult.Plugins.AspNetCoreMvc, we generate code using this step:

  1. use dotnet cli to create project and solution
  2. use StringBuilder and TextWriter to modify generated codes, or add a new file based on model

It’s pretty similar with how we do things in the task provider creation tutorial, where we utilize the angular cli to generate the workspace and component, then use some template file to add custom file based on model.

There’re several issues I found with this approach:

  1. It’s difficult to do code update when the catapult data model changes, especially when there’s a model rename. In the built-in plugin, we have this comment tag for each generated file where we put the model id there, so when the model is renamed, we have a logic to delete those files.
  2. When user update the code manually, it might easily get overwritten by the code generator.

Some idea I found:
For dotnet generated project:

  1. Roslyn
    I found this article on how Roslyn can be used to generate code in cross platform environment: https://msdn.microsoft.com/en-us/magazine/mt808499.aspx?f=255&MSPPError=-2147217396. It can semantically map the code, so it might help us on finding specific class/method, and update them accordingly. However, I found that it is a bit too low level. We might need to find a library that provide a more high level API.
  2. Lamar
    I also find this library that utilize Roslyn: https://jasperfx.github.io/lamar/documentation/compilation/sourcewriter. However, I found that the feature is still limited.

For angular/javascript generated project:

  1. We might be able to peek into angular cli code, on how they are able to generate component, then update the module file to register the component.

For other technology/platform, it’s obviously need specific tool or mechanism in their own scope, that we can cover in another topic.

If anyone have any experience in code generation, or any idea to make the code generation scheme better, let’s discuss this here.

Basically, the goal of the discussion is to have a code generation scheme that’s:

  1. Able to update the code seamlessly without breaking the current code
  2. Able to handle edge cases of user adding custom code to the generated code
1 Like
#2

Good post Brain. When I originally created Catapult back in 2014, I used Roslyn to do much of the work. I can give you access to that code if you don’t have it already. It is messy and slow however. I think with the fourth version of Catapult (about the time Dzaki, Frandi and Aris got involved) I had abandoned Roslyn in favor of the string writers as it was faster and easier to maintain.

It was Dzaki, Aris, or Frandi that came up with the idea of using the *.Catapult.{cs,js,etc} files in order to help reduce the opportunity for custom code to be overwritten. It works well, but I think we were never too clear on how to properly use this scheme by extending classes, etc.

We could provide more instruction about how to use the custom classes that inherit from the base and another idea (at least for compiled rather than interpreted files) we could simply compile all generated classes into a DLL that goes into the generated project so that the user doesn’t stumble on a
".Catapult" file and accidently modify it instead of the custom counterpart. We then would need to add more commented instruction, perhaps even a link to a help doc, at the head of the custom classes that inherit from the already compiled base. As I write this, I am not even sure if you can have a class that inherits from a pre-compiled base living in a DLL, but we may be able to do something similar. Not sure what we do for interpreted (.js) files though.

Another approach is to add special delimiters in the code where users can always demarcate their custom code. For example:

//~custom~//
var customJavaScript = function(foo){ 
alert('bar'); 
};
//~end custom~//

I think that we can take a similar approach with the interpreted files to minimize custom conflicts. For example, we could minify, or even potentially hash, the generated files into one big Catapult.js, then stub out custom .js files with instructions on how to extend base classes.

I have several code generation books and can read up on other design patterns.

Thanks!

MC

2 Likes
#3

Great insight Matt. It’s cool to hear from the originator of catapult!

Thanks for sharing about your experience with roslyn. I think because the scope of code we’re generating is very big (i.e. the whole app), it’s not too performance-wise to fully use roslyn.

One of the edge case I once found when using catapult is for example:

  1. I created a model Product with property Name
  2. In the custom code, I use the Product.Name
  3. Later, I decided to rename the property into Product.Title

After regeneration, the generated solution will encounter build error, because my custom code is still calling Product.Name instead of Product.Title.

I guess we can combine roslyn for this edge cases, and still use the string writers for most of the code generation. Not sure though if roslyn can find all reference of a property, and rename all the reference.

I would love to see the catapult version that still use roslyn. it might give some insight on how Roslyn work.

Lastly, I agree with the approach of adding special delimiters for custom code and having separated files for the generated code and custom code. We can try to implement it in our built-in provider.

I’m not too fond with the idea of pre-compiling the base class though, since I think it’d hide the visibility of the generated code’s architecture.

Again, thanks for sharing Matt.

#4

OK, @brain.konasara you have been added to the old Catapult code base. You should be very afraid of going in there though :grin:.

I think the usefulness of Roslyn would break down when we start creating Task Providers that Catapult projects for, say, Java or PHP, or even Tizen or Alexa.

The way I see it, if the IDE has a way to understand the text that we humans put into it (as long as we follow the correct syntax), then we should also be able to engineer Catapult to understand that same syntax.

So, for example, if your base model of Product has a property of Name which was later removed from the model, then we delete all custom references to Product.Name. Did doing this give us what we want? No, because we actually renamed the property to “Title”. So what does this mean? To me, it means the following:

We need to provide a “Rename Property” function in the CLI and in the Web UI that takes the extra step of renaming (rather than deleting) any custom references so that Product.Name becomes Product.Title everywhere. Basically, if we have told Catapult, “Hey, we are doing a rename operation here”, then it knows to not only rename the property, but to also go through all of the custom code and rename any references to it.

Make sense?

#5

Nice discussion Guys :+1:

I have feeling that there is a high possibility for this “update code” functionality to work as expected in the future. I take “VS Code” as an example - the core app is very basic without knowing anything about certain language, and yet it is capable to work with almost any language now via extensions. VS Code support for C# utilizes the combination of Roslyn and OmniSharp - it’d be great if we could dive deeper to learn them.

One thing we need to keep in mind though that this feature should be kept living in the Task Provider. Because once we’re tempted to include a portion of this C# support in the OpenCatapult core app, there will be a high chance that we will keep adding other tech-specific logics to it in the future, and the core app suddenly become bloated without us knowing it.

So yeah let’s keep brainstorm ideas until we find the best options for this.

1 Like
#6

Nice idea on the “Rename Property”. I guess we can have this flag on the background without user knowing about them. However, I don’t think text search using regex or other search algorithm will be enough because it’d not be too safe to replace a certain string without knowing the semantics/context of the string. E.g. we cannot replace “.Name” with “.Title” in all of the custom code because the “.Name” can be the property for Category class for example.

I think this will where tools like roslyn came useful since it can analyze the semantics of the code. and as Frandi mention, this will be scoped inside the task provider, not in the engine abstraction, so other catapult project like Java would utilize different tools for this. (Though, we can add a high level abstraction in the engine, and have each task provider implement them based on their specific tools)

Anyway, thanks for the input related VS Code Frandi. Yes I guess it’d be useful to peek into how things work in VS Code since it’s a cross platform IDE that support multiple language

2 Likes