Matthew M Dalby

Software Engineering

DSL Based Code Generation!

Work smarter, not harder...

An introduction

Traditional approaches to software development involve a lot of effort. Regardless of the particular language, or tech stack, you have most certainly made observations about the level of boilerplate code required to establish functionality. At first this is exciting, learning a new library such as an ORM persistence framework, obtaining mastery through practice.

In this article we will take a look at an alternate approach where we attempt to automate the process as much as possible via a code generation approach. In this specific example, we will reverse engineer a schema, translate it into an DSL (Domain Specific Language), from which point we will publish source code.

Imagine a scenario where we are faced with migrating an application to a completely different technology stack. Both systems in this example will make use of the same RDBMs system, with the intent of migrating from the existing stack to an Java based implementation. The target language is not as important as the fact we are reversing a model from an RDBMs into a language/framework DSL representation, which can be used to generate a number of optional targets.

This approach might be considered a bit heavyweight for smaller applications, let’s say two dozen tables or less, however it becomes a force multiplier when it comes to tackling larger projects involving hundreds of tables.

An actual project is available on GitHub which illustrates the translation of an RDBMs system into a DSL representation, ready for code generation. The actual implementation of this step or consumer step is written in Java as it provides a rich API (JDBC) to reverse engineer databases with. There are other options, I just happened to select this one as I have worked with it in the past.

An overview of the code generation process

For the purposes of this exercise, we will be focused on just reverse engineering data from a source (an RDBMs) into a DSL format, and publishing that to a local file system for future consumption.

Implementing a Code Generation Solution

While it is possible to define a DSL from scratch, using your own structure, you might find it easier to use an existing structure (I prefer Json) as there are a number of parsers available.

An overview of the code generation process

Step 1: Defining an DSL

The first step in the process of defining a DSL to identify common patterns. While the intent is to describe an abstraction of information, the language you choose to create the DSL in is up to you. I personally prefer JSON, as there are parsers and validators available. It is possible to use other languages, or perhaps invent your own, however I encourage anyone to work with a format for which parsers already exist.

Step 2: Implementing an consumer

Once a format (DSL) is established, we have a an idea of how to represent the data it is time to extract it from the source. I decided to create a command line application in Java for this task. The JDBC API provides a rich set of functionality for working with database meta data.

[[ running the data extraction process ]]
Running the structure extraction process

Step 3: Translating the data model into source code

So, at this point we have extracted the meta information from the relational store into an in memory structure. I personally prefer to persist the resulting data to disk, where I can make additional edits prior to generating the source. Essentially the extraction and generation steps are executed as two separate processes.

For the code generation step, I decided to write the implementation in Node. Although I am comfortable in both Java and Node stacks, I find that some tasks are just faster in a Node environment.

The process for translating our object model into source code is fairly straightforward. Templates are defined (in this case we just need a single template representing an JPA entity definition). The DSL is then fed into a templaring library. For this effort, I made use of the handlebars library as I am familliar with it.

The final result

Executing the source code genrator

As you may observe, all of the effort is in the initial project structure, and the act of running the actual process is pretty straightforward.

An early lesson from an mentor

At an early point in my career I was fortunate enough to have a mentor, as a bonus, she had over a decade of knowledge under her belt, and arguably was one of the smartest individuals I have ever met. This individual talked about a period of time when she was building client applications with the Microsoft Windows API. It was noted there there was a significant amount of effort put into boilerplate tasks.

The conversation progressed onto the topic of code generation, and how she was able to dramatically reduce the amount of effort required to establish the foundation, and allow her to focus her efforts on other areas. This was my first exposure to code generation.

Streamlining development efforts at scale

At a later point in my career, I found myself leading the effort to refactor a large enterprise application. This involved creating an ORM layer in a new tech stack for over 250 database tables. The original estimate was based on an estimated 4 hours per object, 1000 hours, or roughly 25 weeks to complete that particular task alone for a single resource.

Using a similar approach as defined in this article, we were able to reverse engineer the database, generate the new ORM representations to a point where about 90% of the effort was automated, good times!

Summary

This article provides a brief example of how to perform a few steps in a code generation process, which can open some doors towards streamlining future development efforts.

With a lot of hype around GPT and the potential for generating code, I think a lot of people would be surprised to know that generating source code is not a new concept.

For this example we focused on extracting data specifically from an RDBMs source, and generating Java JPA entities, however there are many other potential applications.

Source code for the application is availbe on GitHub at https://github.com/west-coast-matthew/rdbms_generator.