Building Builders

The Builder Pattern is a software design pattern where the logic to set up an object is separated from the actual object. As discussed last time, this patterns is particularly useful when dealing with immutable types where all the information about the object is often required at the time of construction. The builder pattern allows the information required to construct the object to be collected separately by code with different concerns, until the point Build is called to actually construct the object.

With code generation baked into roslyn, patterns such as the Builder Pattern which often require a lot of boilerplate code will be extremely simple to automate. In this post we will investigate how easy this really is.

Full disclosure: after planning to write this article I happened upon another article which is pretty much about the same concept - Code Generation with .NET 5 – Builder pattern. This to me just emphasises how lacking such an ability has been in C# - and makes me excited for the direction Microsoft are taking the language (and in this case, compiler services). In this article I hope to present an alternative perspective to the one in the aforelinked article - specifically I will focus a little more on how to actually write a code generator.

Getting Started

If you intend to develop a code generator, you will need to have

  1. .NET 5 SDK (the combined next version of .NET Core and .NET Framework)
  2. Visual Studio 2019 Preview
  3. Lots of patience in restarting VS (the .NET dev team have apologised in advance for this one)
  4. Follow the instructions on setting up the project here.

The Goal

The desired result is to have a builder automatically built during compile time, with as little boilerplate as possible.

Given this class:

    public class Person
    {
        public Person(string name, int age)
        {
            Name = name;
            Age = age;
        }

        public string Name { get; }
        public int Age { get; }
    }

We would like to apply a [Buildable] tag to the class and automatically have some sort of PersonBuilder which could be used as follows.

    PersonBuilder builder = new PersonBuilder();
    builder.Name = "Andrew";
    builder.Age = 29;

    Person person = builder.Build();

    Console.WriteLine(person.Name);
    Console.WriteLine(person.Age);

Anatomy of a Souce Generator

Source Generators are regular C# classes which implemet the ISourceGenerator interface.

    // Summary:
    //     The base interface required to implement a source generator
    public interface ISourceGenerator
    {
        
        // Summary:
        //     Called to peform source generation. A generator can use the context to add source
        //     files via the Microsoft.CodeAnalysis.SourceGeneratorContext.AddSource(System.String,Microsoft.CodeAnalysis.Text.SourceText)
        //     method.
        void Execute(SourceGeneratorContext context);
        
        // Summary:
        //     Called before generation occurs. A generator can use the context to register
        //     callbacks required to peform generation.
        void Initialize(InitializationContext context);
    }

The source generator must first discover on which classes it is to be run. The first part of this phase is to hook into the syntax tree generation. In the Initialize method we can call context.RegisterForSyntaxNotifications with a custom implementation of ISyntaxReceiver. This custom Receiver is notified for every syntax node parsed by the compiler and takes notes of the "candidates" - the nodes which could possibly represent targets of code generation (e.g. in our case a ClassDeclarationSyntax with more than one attribute).

    internal class BuildableReceiver : ISyntaxReceiver
    {
        public List<ClassDeclarationSyntax> CandidateClasses { get; } = new List<ClassDeclarationSyntax>();

        public void OnVisitSyntaxNode(SyntaxNode syntaxNode)
        {
            if (syntaxNode is ClassDeclarationSyntax cds)
            {
                if(cds.AttributeLists.Count > 0)
                {
                    CandidateClasses.Add(cds);
                }
            }
        }
    }

The receiver is then passed into the generator's Execute method once all syntax has been parsed and we are in the semantic phase. However at this point we are expecting the target library to have classes referencing an attribute which we have yet to create. The .NET team's example of automatically implementing INotifyPropertyChanged used the following strategy.

        private const string _attributeText = "<removed for brevity, just a regular attribute class called BuildableAttribute>";

        // Later

        var sourceText = SourceText.From(_attributeText, Encoding.UTF8);
        context.AddSource("BuildableAttribute.cs", sourceText);

        // we're going to create a new compilation that contains the attribute.
        // TODO: we should allow source generators to provide source during initialize, so that this step isn't required.
        CSharpParseOptions options = (CSharpParseOptions)((CSharpCompilation)context.Compilation).SyntaxTrees[0].Options;
        Compilation compilation = context.Compilation.AddSyntaxTrees(CSharpSyntaxTree.ParseText(sourceText, options));

The above code adds the C# code found in the constant string to the compilation as if the user wrote it. This is an incredibly powerful feature, and the first taste of the code generation in action. Already at this point, target libraries can reference [Buildable] as if the attribute existed as part of their code base, and this comes from the power of the very simple-looking context.AddSource method.

However we are still not done. After all this, we still need to find which classes the user has annotated with our attribute.

    // Retrieve our original receiver
    var receiver = (BuildableReceiver)context.SyntaxReceiver!;

    // Get a reference to the attribute we just compiled
    var buildableSymbol = compilation.GetTypeByMetadataName("BuilderBuilder.BuildableAttribute");

    foreach (var @class in receiver.CandidateClasses)
    {
        var model = compilation.GetSemanticModel(@class.SyntaxTree, true);
        var typeSymbol = model.GetDeclaredSymbol(@class);

        // Is the class annotated with this attribute?
        var isBuildable = HasAttribute(typeSymbol, buildableSymbol);
        if (isBuildable)
        {
            // Here things actually happen
            Execute(context, typeSymbol);
        }
    }


    // Helper method
    private bool HasAttribute(INamedTypeSymbol typeSymbol, INamedTypeSymbol attributeSymbol)
    {
        foreach (var attribute in typeSymbol.GetAttributes())
        {
            if (attribute.AttributeClass?.Equals(attributeSymbol, SymbolEqualityComparer.Default) == true)
            {
                return true;
            }
        }
        return false;
    }

I am disappointed by how difficult it is to do this - while I accept rough tooling for now, the .NET team hasn't made their usual stellar effort at simplifying the API for dummies like myself. To get this feature to work one must first understand Roslyn concepts such as the different phases (syntax, semantic), as well as learning the actual Roslyn API, and while it's an incredibly well-thought out API, most of us just want to get our hands dirty before studying the documentation.

In the released version of this feature I would like to see a helper library which dumbs down the API for the majority of use cases. For example we can imagine that most source generators would want to just listen for when a class, method, or property is annotated with a custom attribute. It could look something like this:

    // Extend this to skip all the hard work.
    abstract class ClassLevelAttributeBasedSourceGenerator : ISourceGenerator
    {
        abstract string AttributeName { get; }
        abstract void Execute(SourceGeneratorContext context, INamedTypeSymbol typeSymbol);

        // Other stuff
    }

Generating the Source

From here it gets way easier - we only must reason with the Semantic model which contains all the rich type information available to the compiler. The challenge is simple; create a string representation of the builder class and call context.AddSource. I've separated the logic to create the string in an isolated class. The Semantic model API for Roslyn is quite straighforward to understand and get the hang of after all the Compilation bits are done. For example, the following method retrieves all properties on the target type.

    private IPropertySymbol[] GetProperties(ITypeSymbol type)
    {
        var result = new List<IPropertySymbol>();

        foreach (var member in type.GetMembers())
        {
            if (member is IPropertySymbol propertySymbol)
            {
                result.Add(propertySymbol);
            }
        }

        return result.ToArray(); ;
    }

There aren't any more interactions with the API after the properties have been obtained (for our extremely simplified use case). As mentioned earlier it's just a matter of building up the string and the code can be found here.

Result

Going back to the goal, we build the project and it just compiles. There's no generated code to display - it's all been generated and consumed by the C# compiler in one go. The PersonBuilder class appears highlighted as a "regular" class would be, but the Go To Definition command is unavailable. My instict tells me that I would like to have access to the generated code; just as I often look through the decompiled .NET source code, looking at the generated code might be useful when debugging.

The project can be found on Github for those interested. The state of the repository at the time of writing is this. Remember: It's absolutely not in a production-ready state whatsoever, and makes extremely naive assumptions such as properties listed in the same order as the constructor.

Further work

Below are some features I would expect to see in a full-fledged production version of this "Builder Builder" concept. I believe that there will eventually be such a library and it will be present in almost all large .NET projects - the utility is great.

Fluent API: Different coders have different coding styles. There's a certain hype around the fluent syntax when it comes to building objects. I don't like the prevalence of the fluent syntax for object construction - it's way too overused even when good old assignment would do the job. Nevertheless I can imagine a different attribute such as [FluentBuildable] which can build fluent builder APIs if the coder thinks it's appropriate.

Thread Safety: This could range from allowing thread-safe access to properties of the builder (think Interlocked) to more complex synchronisation mechanisms, for example a consistent API for read locks and write locks on builders.

Object Trees and Complex Objects: The implementation provided above assumes a single immutable object will be produced by the builder. However such an object can contain other immutable types which in turn can hold other immutable types and so on. A full-featured library could perhaps provide "Builder Trees" which would allow the building of a whole hierarchy of objects using a single Build command.

Reduced Boilerplate in Validation: The Build method is a convenient location to invoke validation logic. In some circumstances it might be useful to have a single consistent API for validation where some fields can be automatically checked (such as the builder following the semantics of [Required] attribute) or for the builder to invoke more complex user-defined logic through partial classes.

I am convinced that the .NET community would welcome a library with all these feature with open arms.


Other posts you might like


Join the Discussion

You must be signed in to comment.

No user information will be stored on our site until you comment.