Wednesday, 22 December 2010

Domain Specific Languages

If, like me, you started your software engineering career on classic time-sharing systems (for me it was a VAX 11/780) you probably had a small restricted set of programming languages available to you, e.g. Fortran, BASIC, Pascal etc. These languages are all classed as General Purpose Languages. What binds GPLs together is that they are designed to solve any class of software problem and are not tied to any domain. In fact, when you examine the structure of languages like C, BASIC, Fortran, Java, they all share a lot of common constructs in terms of control structures, branches, looping, variable assignment etc.

This was all straightforward enough, in fact early in my career I moved  (fairly) freely from programming in BASIC, Fortran, C and Pascal. In the early 90s I moved into the CAD (Computer Aided Design) / PLM (Product Lifecycle Management) industry, specifically the AutoCAD industry. It was at this point I was introduced to a language which, at first, completely foxed me. This language was Lisp, specifically AutoLisp.

At a first view, Lisp looked quite alien, endless parenthesis, strange functions such as car, cons and cdr, but once I got to grip with he basics the power of the language became apparent. The key to Lisp are sets and lists, in fact Lisp stands for List Processing. Here's a bit of sample Lisp code:
(setq cntr 1)
(while (cntr < 5)
 (princ cntr)
 (seqt cntr (+ cntr 1))
)
A pretty simple example that sets the variable cntr to 1, prints the value of cntr to the console, loops round 4 times. The key to the Lisp language it sets and lists and Lisp is fairly unique in that it is a homoiconic, i.e. the representation of the language syntax is a data structure primitive type of the language itself. XSLT and XQuery are also classed as homoiconic languages.

Lisp lends itself to programming AutoCAD quite well, as it was set and list orientated and with a CAD system a lot of the programming was aimed at manipulating geometry of the CAD model / drawing.

Anyway, back to homoiconic. The clever bit about Lisp is the fact that functions can be treat in the same way as variables, that is, functions can be passed as function arguments and returned as function values. It's this key property that allows powerful abstractions to be built in a language like Lisp, hence it use as a DSL. As always an example helps:
(defun double (x)
 (* 2 x) /* define a function double that simply times the argument by 2 */
)
Now, here's the clever bit, if we want to compute 24 we could use the double transform on 1 four times.
(double (double (double (double 1))))
Now, that's a bit long winded, even better would be a function that took a function and an object and repeated the function transform n times. In Lisp, that's doable.
(defun repeat-transformation (F N X)
  /* Repeat applying function F on object X for N times */
  (if (zerop N) /* Returns true if the argument is zero */
      X
    (repeat-transformation F (1- N) (funcall F X))
   )
)
What's happening in this code is, of course, recursion. The key is the funcall which given a function F and objects X1,X2...Xn the form (funcall F X1 X2 ... Xn) invoke the function F with the arguments X1 X2 ... Xn. The variable N is a counter keeping track of the remaining number of times we need to apply function F to the accumulator variable X.

I'm not planning to do in-depth programming in Lisp here. If you're interested in learning more about Lisp I recommend you take a look at Practical Common Lisp by Peter Seibel. The text for the book can be found on-line here. Hopefully, though, you've got the idea that a programming language can, in essence, manipulate itself to create a new subset language, hence a Domain Specific Language.

So what use are DSLs? A good real life practical example is Apache Camel. Camel is an integration framework that provides a implementation structure and runtime for the classic Hohpe and Woolf Enterprise Integration Patterns (EIP). Integration is a great application for a DSL as you have a common patterns to solve and components to 'wire' together. In Camel's case, the common elements are wiring Processors that carry out functions such as transformation, mediation, routing etc, with Endpoints that allow messages to be sent and received, e.g. JMS, HTTP etc.

To demonstrate how productive and powerful a DSL (and in this case Camel) can be, imagine you've got a systems integration problem to solve which involves reading files from one directory and placing them in another. Simple right? Here's a possible solution in Java.
public class FileCopier {
 public static void main(String args[]) throws Exception {
  File inboxDirectory = new File("data/inbox");
  File outboxDirectory = new File("data/outbox");

  outboxDirectory.mkdir();
  
  File[] files = inboxDirectory.listFiles();

  for (File source : files) {
   File dest = new File(
    outboxDirectory.getPath()
    + File.separator
    + source.getName());
    copyFile(source, dest); 
  }
 }

 private static void copyFile(File source, File dest)
  throws IOException {
   OutputStream out = new FileOutputStream(dest);
   byte[] buffer = new byte[(int) source.length()];
   FileInputStream in = new FileInputStream(source);
   in.read(buffer);
   try {
    out.write(buffer);
   } finally {
    out.close();
    in.close();
   }
 }
}
There's a fair bit of code their for a simple task. Also, we haven't addressed error handling, concurrency, polling for the files, keeping track of what files have been moved etc etc.

Now here's the sample problem solved with Camel DSL.
public class FileCopierWithCamel {
 public static void main(String args[]) throws Exception {
  CamelContext context = new DefaultCamelContext();
  context.addRoutes(new RouteBuilder() {
   public void configure() {
    from("file:data/inbox?noop=true")
    .to("file:data/outbox");
   }
  });
  context.start();
  Thread.sleep(10000);
  context.stop();
 }
}
A lot simpler. Most of the code above is Camel boilerplate, for example setting a CamelContext that's started (context.start()) and subsequently stopped (context.stop()). The integration logic is all defined in the RouterBuilder function and the from.to function. Essentially down from around 30 lines of Java code to 14. Also, all the things we didn't address in the pure Java example, e.g. error handling etc, is all handled by the Camel framework and runtime.

For another good example of application of DSL is FIT (Framework for Integrated Test). FIT looks to solve that eternal software engineering problem of testing and test scripts and closing the gap between developers, business analysts and end users. FIT reads HTML tables that map onto system classes methods and properties. A Business Analysts or End User creates a FIT document that describes the tests to be carried out in the form of the table with inputs and expected outputs. The FIT document can be created in Microsoft Word and exported as HTML. Developers then create Fixtures which map the FIT documents onto the business logic code of the application being tested.

In the case FIT, the DSL is, what FIT term, the Fixture. Fixtures wrap the specific application business logic that allows the FIT documents to execute the tests. A simple Fixture class can be seen below.
import fit.ColumnFixture;

public class Division extends ColumnFixture { /* Extending the ColumnFixture class allows the FIT runtime to map the input test tables to the logic code */
    public float numerator;
    public float denominator;
    public float quotient() {
        return numerator / denominator;
    }
}
The language that's currently getting most attention in the DSL world is Groovy. Groovy, if you've not yet come across it, is a lightweight Java like language and, like Lisp, supports closures and code as data and, therefore, ideally lends itself to DSL creation. There is a great simple example of a DSL for Stock Market transactions written in Groovy by Justin Spradlin here.

Increasingly, there are tools becoming available for DSL construction. A good example of one of these is Meta Programming System (MPS) from JetBrains.

So what's the future for DSL? I believe they have a place. I'm not sure there are too many advantages in trying to build DSLs for business applications, they are just too varied to gain extensive reuse and may be too unfamiliar for use by end users. The Apache Camel project shows the way for me with DSLs, accelerating the development of common development patterns, Groovy is also being used for this kind of 'plumbing' application.

If you want to found out more about DSLs I'd recommend visiting Martin Fowler's Wiki and taking a look at his book Domain Specific Languages.