Sunday, 6 December 2009

Information - The Missing Architecture in the Enterprise?

Enterprise Architecture approaches and frameworks, such as Zachman and TOGAF, talk about Business, Systems, Technical Architecture components, but rarely use the term Information Architecture. It appears, to me, that the majority of Enterprise Architecture frameworks and tools have their roots in 'traditional' systems and applications and have yet to catch up with the world of Mashups, REST, Situational Applications and Social Computing.

Enterprises today are still grappling with a myriad of LoB (Line of Business) applications, both COTS and bespoke, all with varying degrees of heritage and legacy. It's my view that SOA has failed to deliver, and that's backed up by a number of commentators including Anne Manes at The Burton Institute and Tim Bray, Father of XML, now Director of Web Technologies at Sun.

So were do Enterprises go from here? I believe the development of systems has almost split into two tracks, those that entirely embrace the paradigm of the Web and those within Enterprises that still seem to be 'stuck' in a pre-Internet age. Take a look at any Internet facing consumer application, if it's usability, experience, graphic design and content don't meet user expectations then users will simply go elsewhere and use another service. I'm sure everyone reading this has had the experience of some torturous e Commerce application where we've abandoned a transaction in shear frustration with idiocy of the site design.

There is a good reason for this of course, Enterprises have a large legacy of applications, typically packaged COTS solutions with large functional footprints such as ERP. Packaged application vendors have always struggled to keep their architectures in-line with technology developments. Internet service providers, on the other hand, have much greater freedom in their application architecture, essentially only constrained by the capabilities of a browser, and very little legacy to contend with.

There are now attempts to bridge the Enterprise / Internet divide with technology platforms such as Microsoft SharePoint and IBM Mashup Center. The key difference with these solutions is that they 'fuse' traditional business data (ERP, PLM, CRM etc) with Web 2.0 content such as Wikis and Blogs. Both IBM and Microsoft have recognised the potential to bring Web 2.0 to an Enterprise environment. These platforms, though, require new thinking if benefits are to be realised. This is where Information Architecture comes in.

What is Information Architecture? Well, as you'd expect, there are numerous definitions. In the O'Reilly publication Information Architecture for the World Wide Web, Information Architecture is defined as:
  • Organisation, Labelling and Navigation
  • Structural Design of the Information Space to support Intuitive Access to Content
  • Structure and Classification to support Information Access
  • An Emerging Discipline
Not sure it helps, but it's a definition. The key challenge is that these solutions bring together traditional LoB data, such as Parts Information, Purchase Orders, Customer Records etc, and allow you to combine these with unstructured content and workflow. The use of Mash-up type approaches allow end users to combine information sources and visualise it the way they wish. These capabilities, when combined, require a different approach to system design and implementation than any traditional packaged application.

Skills, normally associated with pure web site design are of greater importance, in some cases, than application functionality. Disciplines required include graphic design, interaction design, usability engineering, experience design, content management, knowledge management, as well as the core software development skills.

So how do you go about creating an effective Information Architecture? Often, Information Architecture is described as being made up of three components:



Business Context - an organisation's culture, resources, skills, business model and constraints
Content - Information an organisation produces and consumes, both structured and unstructured
Users - Reflecting the way people in the organisation in the Information Architecture design is critical to it's success.

Web Designers and Solution Architects have, in the past, tended to 'live' in separate worlds that never meet. I guess this is possibly due to the career heritage of people who have these roles, i.e. Web Designers typically have a graphic design background, Solution Architects tend to more technology orientated. Going forward, though, the next generation of Enterprise applications need to converge these two historically separate disciplines if there going to meet the needs of Internet savy users.

Friday, 26 June 2009

The Internet of Things

The Internet is, of course, now universal, with over 625 million hosts currently registered according to the Jan 09 ISC Domain Survey. Even with the advent of Internet enabled end devices such as 2.5/3G mobile phones, the majority of these connections will be computers of some kind.

But, what if you could connect almost any device to the Internet, medical devices, cars, toys, weather stations your home even? The possibilities for opening up new classes of applications are mind bloggling. Since the early days of the Internet there have always been people connecting 'odd ball' devices to the Internet, the Internet fridge is just one example.

Pretty much any electronics hobby enthusiast can rustle up a circuit with some form of sensor board and wire it to an Internet connected PC. Also, numerous companies have manufactured data loggers and instrumentation devices for years, and most of these can connect to PCs. The key to enabling this vision is standards and interoperability. Sun's approach does just that, focusing on bringing together Open Source standards and hardware, Java, ad-hoc networking and the Internet.

This is where Sun Microsystems is heading with its Sun SPOT vision. Sun SPOT is a SunLabs research project that kicked off in 2003. This work has led to Sun selling a Sun SPOT Developer Kit to the public, mainly to drive interest in potential applications. The video below outlines Sun's vision.


The key innovation is not only have Sun made the SDK Open Source, but the OS (Squawk) and, believe or not, the hardware. This means that anyone's free to download the SunSPOT bill of materials, circuit designs, schematics and drawings, send them to an outsource electronics fabricator (of which there are loads who will build you small batch runs) and you can have your very own custom device.

One of the major advantages of the platform is the fact all development is done in Java. Anyone who's had experience of developing for embedded systems knows that specific architecture, software engineering and programming skills are required. Sun have put effort into ensuring that any skilled Java Developer can pick up a Sun SPOT device and get going straight away without any embedded systems background. It's important to note that Sun SPOTS are much more that your typical data logger device, they're a computing platform in their own right. The ability to create ad-hoc mesh networks of these devices, coupled with Agent-Based software architectures is what makes these devices so unique.

So what are the potential applications. I work in the Defence industry and I can see applications in military and security. For example, imagine parachuting dozens of these devices across a theatre of operations, each fitted with an array of sensors. They would also have the ability to network with each and other military systems when they are in range, for example warning a squad of troops of potential suspicious activity in an area.

Applications that require remote data acquisition and logging are also obvious candidates. SunLabs have an experimental environmental monitoring solution called Canopee deployed to Kalakad Mundanthurai Tiger Reserve (KMTR) in India.

Sun are hoping to repeat their successfully strategy of getting Java onto just about any device you can think of, from mobile phones to digital tv set-top boxes. Their goal is to open up and accelerate the market for wireless sensor based applications by standardizing the hardware and reducing software implementation effort. It will also be interesting to see how this technology starts to converge with RFID.

Currently, most applications are in Universities and Research Labs, but given the momentum behind Java and the Open Source nature of the whole platform, I believe we could soon be seeing SunSPOT applications opening up in the near future.

Thursday, 11 June 2009

Scaling Software Design Patterns to the Enterprise

Much has been written on Architecture Styles and Software Design Patterns. Concepts such as coupling, cohesion, abstraction, modularity and information hiding are all well understood by Developers and Architects when designing software systems. There are volumes of best practice and guidance widely available to solve most software engineering problems.

Where there's less established guidelines and practice is in large scale Enterprise Architecture design. In particular the eternal problem of partitioning business services and functionality and allocating them to systems. Most Enterprises are complex, processes vary by business unit and function and all have a legacy systems landscape that has grown over time. Also, the the biggest issue is change, organisations change to meet new customer needs and markets, or when acquiring and disposing of operations. Enterprise Architectures struggle to keep up with these changes, no sooner have you finished a major ERP implementation programme, re-engineered numerous business processes, than the Enterprise reorganises, divests operations and places new demands on systems.

What I have noticed in my experience of large Enterprises is although reorganisations and business change occurs, there usually is a minimal cohesive business capability below which cannot be reorganised. For example, take a capability such as purchasing, managing customer orders, demands, purchased orders and the purchase-to-pay cycle does not make sense to split apart as the business service loses it cohesion. The capability would become inefficient as it far too frequently communicated with another separately managed function to fulfil it's service requirements.

In a lot of ways optimum organisational services or functions are designed in a similar way to a good OO design following class responsibility collaborator (CRC) principles. CRC seeks to ensure that responsibilities of any given class are the most appropriate given the other classes it collaborates with to achieve a use case or scenario. The goal of CRC is to analyse a class and it's adjacent classes, understand what they each need to know about themselves (responsibilities) and how scenarios drive different collaborations between then. In a complex domain, it may not be obvious what methods belong to what classes and allocating a method to the wrong class can often mean an overall sub optimum design. Also, knowing the frequency and nature of collaboration between classes reveals the cohesion between them and ensures they are deployed into the same components. The goal is to achieve high levels of cohesion of classes within components while maintaining low coupling between components.


Example CRC Cards for the classic Model, View Controller Pattern

If you scale this up to business and system components you essentially have the same problem, So to maximise the flexibility of the Enterprise Architecture, the goal is to align the business process and service cohesion and coupling to the systems cohesion and coupling, with the aim to create self contained highly cohesive autonomous business and system services. In a sense you can draw analogies between system classes & components and business processes and functions.

To provide a real life example I was involved in a project to implement a supply chain solution which was to roll out across multiple business programmes. When we came to one particular programme it had an existing supply chain system, but because the particular COTS application they had been using had engineering parts and document management functionality, they had started to embed these capabilities into their supply chain processes. These processes had become engrained and stakeholders were reluctant to re-engineer and move the document management and engineering parts functionality to other Enterprise wide systems. Essentially, if you examined the business responsibilities and collaborations between purchasing and engineering, they had unknowingly created a very low cohesion supply chain services that had tight coupling to an external capability. Hence 'breaking apart' the process and systems proved extremely difficult.

When an examining an overall Enterprise Architecture, the goal should be to create a set of as autonomous and cohesive business and system services as is feasible. This is difficult to do and requires strong IS governance and stakeholder support. This is also challenging when dealing with COTS applications. I often see COTS applications in Enterprises as a set of overlapping functional components that could be visualised in a venn diagram. The battle is to gain agreement what COTS component should be used for what capability. For example, most ERP solutions have a Document Management module that tends to tightly integrate with the other ERP modules, allowing associations between business objects and documents. Most organisations, though, produce more documents outside of core processes being support by the ERP, therefore Document Management should be regarded as an cohesive Enterprise capability in its own right that communicates with many other parts of the organisation. Attempting to mandate the ERP as the Document Management system would exclude numerous people in the organisation who needed document management yet never touched the ERP system in the course of their role. Result, probably the vast majority of documents being managed outside of a formal system or the Enterprises document management capability distributed across multiple systems.

If you can get the Enterprise Architecture 'assembled' from these highly cohesive components, I believe there's a greater chance of being able to provide flexibility, agility and responsiveness to change, and that is what all CEO's what from their IT.

For further reading, I recommend an excellent article by Alistar Cockburn on Reponsibility-Based Modeling. I'd also recommend you read the classic OO book by Rebecca Wirfs-Brock Object Orientated Design: a Responsibility Driven Approach.

Friday, 24 April 2009

Open Source Google for Everyone

There's a real 'spike' of activity going on at the Apache Software Foundation at the moment. I wrote about CouchDB in an earlier post, but there are a number of very interesting projects running currently. Probably the most significant is Hadoop. Hadoop was promoted to an Apache 'Top Level Project' a year ago but it's now taking off in the Open Source community.

Hadoop is a highly distributed computing middleware designed to process petabytes of data across 1000's of commodity hardware nodes. It implements a computational approach called Map/Reduce across a distributed file system to deliver a highly fault tolerant compute platform to process very large data sets in parallel. Hadoop is 'inspired' by Google BigTable.

So how does it work?

There are two major components to Hadoop:
  • HDFS - a distributed file system that replicates data across many nodes
  • Map/Reduce - an execution middleware that distributes processing to nodes where the data resides
Files loaded onto HDFS are split into chunks and these chunks are replicated to every node in the Hadoop cluster. System monitoring responds to hardware and processing failures to replicate data to other nodes providing very high levels of fault tolerance.

In the Hadoop programming framework data is record orientated. Input files are broken into records, lines or whatever sub element is appropriate for the processing application logic. Each Hadoop process running on a node processes a subset of these records. Essentially, if at all possible, processes act on data local to the node hard disk and do not transfer data across the network. The Hadoop approach has a strategy of moving computation to the data rather than the data to the computation. This is what gives Hadoop it's performance.



The splitting and recombining of data and processing is handled using a Map/Reduce algorithm. Here records are processed in isolation by tasks called Mappers. The output from the Mappers is then brought together into a second set of tasks called Reducers, where results from different mappers can be merged together



The clever aspect of Hadoop is that it takes pretty much all of the cluster and distributing processing away from the Developer, letting him focus on the application logic.

In my early programming career I worked on Apollo Domain Workstations, and I always remember one of the coolest programming examples that shipped with the operating system (AEGIS) was a Mandelbrot generator that executed elements of the set on different nodes in the network in parallel. That was my first experience of the power of distributed parallel computing. The problem with the program though is that all the inter-process and node communication was coded 'low level' through TCP socket programming etc. If I remember rightly, most of the code was handling all of this IPC stuff rather than generating the Mandelbrot sequences. This is the exact problem Hadoop solves.

The architecture of Hadoop exhibits flat scalability. On a cluster with small data sets the performance advantage is minimal, if at all. Once your program is running on two nodes with a 1 GB of data, it'll scale to thousands of nodes and petabytes of data without modification.

For an example Hadoop application imagine you wanted to write a program that counted unique occurrences of words in multiple text files. Example text files would look like:
text1.txt: google is the best search engine

text2.txt: a9 is the better search engine

The output would look like:
a9 1
google 1
is 2
the 2
best 1
better 1
search 2
engine 2

A pseudo code for a Map Reduce approach for solving this looks like:
mapper (filename, file-contents):
for each word in file-contents:
emit (word, 1)

reducer (word, values):
sum = 0
for each value in values:
sum = sum + value
emit (word, sum)

Several instances of the mapper function get created on different machines in the cluster. Each instance receives a different input file (it is assumed that we have many such files). The mappers output (word, 1) pairs which are then forwarded to the reducers. Several instances of the reducer method are also instantiated on the different machines. Each reducer is responsible for processing the list of values associated with a different word. The list of values will be a list of 1's; the reducer sums up those ones into a final count associated with a single word. The reducer then emits the final (word, count) output which is written to an output file.

The Hadoop distribution ships with a sample Java program that, essentially, does a similar task. It's available in the Hadoop distribution download under src/examples/org/apache/hadoop/examples/WordCount.java. This is partially reproduced below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
public static class MapClass extends MapReduceBase
implements Mapper<longwritable, intwritable=""> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
 
public void map(LongWritable key, Text value,
   OutputCollector<text, intwritable=""> output,
   Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
  word.set(itr.nextToken());
  output.collect(word, one);
}
}
}
 
/**
* A reducer class that just emits the sum of the input values.
*/
public static class Reduce extends MapReduceBase
implements Reducer<text, intwritable=""> {
 
public void reduce(Text key, Iterator<intwritable> values,
      OutputCollector<text, intwritable=""> output,
      Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
  sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
</text,></intwritable></text,></text,></longwritable,>

The final component of the Map/Reduce algorithm is the Driver. The driver initializes the job and instructs the Hadoop platform to execute your code on a set of input files, and controls where the output files are placed.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
public void run(String inputPath, String outputPath) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");
 
// the keys are words (strings)
conf.setOutputKeyClass(Text.class);
// the values are counts (ints)
conf.setOutputValueClass(IntWritable.class);
 
conf.setMapperClass(MapClass.class);
conf.setReducerClass(Reduce.class);
 
FileInputFormat.addInputPath(conf, new Path(inputPath));
FileOutputFormat.setOutputPath(conf, new Path(outputPath));
 
JobClient.runJob(conf);
}

The Apache Hadoop project also has a number of sub-projects that utilise or complement the core Hadoop middleware including:
  • HBase - a distributed database
  • Pig - a high-level data flow language to ease the development of parallel programs for Hadoop
  • Zookeeper - a management middleware for Hadoop
  • Hive - a data warehousing infrastructure
  • Mahout - machine learning libraries supporting a Map / Reduce processing model
So is any one using Hadoop and what for?

You bet, probably the biggest names using Hadoop are Facebook , Amazon and Yahoo. Facebook is using Hadoop to perform analytics on it's service, Amazon's using it for producing the product search indicies for it's A9 search engine. Even Microsoft is getting in on the act via its acquisition of Powerset, a NLP search engine. Yahoo use Hadoop to fight spam.

The New York Times used a Hadoop based solution to process 11 million TIFF images to PDF, all running on Amazon's EC2 and S3!

A company called Cloudera has started offering development, consulting and implementation services to clients wanting to implement Hadoop solutions.

I believe the future for Hadoop looks good. It opens up a whole area of large scale parallel computing to organisations and companies which just wasn't available before without dedicated supercomputing capabilities. You couple Hadoop with on-demand Cloud computing with services such as Amazon's EC2 and S3 then you have supercomputing for the masses.

Google's success was built on the foundation of Bigtable and it's Map / Reduce technology, having such a technology as Open Source, I believe, will drive a whole new generation of Internet computing services and applications.

Thursday, 2 April 2009

Role of the Business Analyst in Agile Projects

When I started my career in software in the mid 1980s it was in the role of Analyst / Programmer, the roles Software Engineer, Developer, Architect just didn't exist - well not as official job descriptions.

Analyst / Programmer pretty much accurately described the role, I was responsible for understanding the business process and information requirements, eliciting system specifications, designing the system as well as implementation and test. Come to think of it I did a fair bit of the deployment / system admin type activities also!

You still see Analyst / Programmer role descriptions appearing in Job Sites, but a big proportion of organisations have separated business analysis from development / implementation. I believe this is just a reflection of the general trend in the industry towards role specialisation. hence Data Architects, Security Architects, ERP Module Consultants etc etc.

I believe one of the problems with the Business Analyst role is that organisations sometimes do not put a clear definition of what the role is and how it bridges the business / systems divide. In my experience, a lot of Analysts have strong business or domain backgrounds but very little systems development experience. Also, a lot of Analysts I've come across in projects do not have any formal systems analysis / method training or experience, e.g. RUP / UML / SSADM / Yourdon etc. I'm not saying formal systems analysis is a sliver bullet, but having strong skills and experience in systems analysis helps to elicit a business problems into a system definition.

What can happen in delivery projects is a 'gap' can grow between the 'technically orientated' development team and the business analyst community. It can end up in Developers rejecting requirements for being too poor and vague, and Business Analysts getting frustrated that the system is not meeting the customer need. Lack of implementation / design detail in the requirements usually ends up with Developers making design assumptions in the code which can often turn out to be incorrect. Business Analysts, in some cases, can often end up being no more than proxies to the stakeholders.

The IS industry is littered with myths, and one of these that particularly annoys me is that statement that "techies" can't / won't / don't talk to the business, customers and end users. I will admit that people who go into Software Development and Programming do so because they are attracted by creativity of software and the technical aspects, but I've not yet come across a Developer who can't face off to the business if he's given the chance. I believe this myth ends up becoming self fulfilling as Developers don't get the opportunity to be more exposed to the business domain.

There's also the myth that end users cannot carry out any form of analysis out themselves. Most people now have PC's at home, Broadband Internet. I repeatedly come across end users who, when faced with a IS problem and no immediate solution, turn to customising Microsoft Office with VBA. IS professionals will, of course, "scoff" at this, but some of these solutions I've come across usually turn out to be quite smart given the limitations of the technology that's available to them.

The kinds of issues I've repeatedly seen with Business Analysis include:
  • Lack of formal training and systems analysis skills with the Analysts
  • Analysts having a lack of understanding of the capabilities of the technology and limitations of the architecture
  • Non functional requirements not defined as these tend to need some level of architecture understanding
  • Over analysis, or Analysis Paralysis, as it's often called
The Agile approach is all about avoiding these problems. At it's core is the philosophy that frequent working software in front of customers is the goal, iterative spiral development life cycles with emphasis on prototyping to elicit requirements rather than paper specs and the implementation team being as embedded into the customer domain as is feasible. So in an Agile projects, what is the role for the Business Analyst?

I don't believe that the Analyst role is dead, it just needs radically rethinking in the light of modern systems development.

I believe the key to improving the Analyst role is two fold:

Firstly get Analysts more cross trained in technical skills, not necessarily becoming proficient Developers, but gain an appreciation of current technologies and software development. Also ensure there have some level of formal systems analysis background to aid their "systems thinking" to eliciting business requirements.

Secondly, re-position the Analyst role to be one more focused on business change, process improvement, training and acting as a "champion" for the solution being built, rather than gathering and documenting requirements. I believe that this role is key to getting a solution deployed into an organisation and benefits realised from it.

If you want to find out more on this subject then I'd recommend an article on Agile Analysis by Scott Ambler.

Sunday, 8 March 2009

Colossus - the Orignal Agile Project?

I have been wanting to visit Bletchley Park and the National Museum of Computing for a while now and decided to take the trip down to Milton Keynes to take a look around.

The story of Bletchley Park is not only a fascinating tale of British ingenuity and brilliance against the Nazi threat of the Second World War, it's also the story of the birth of modern computing.

For those of who don't know, Bletchley Park was the home of the UK Government Code and Cypher School (GCCS), the forerunner of GCHQ, during World War II. GCCS was set-up to crack and decipher German signals captured by numerous Station-Y listening posts around the UK. It's been the subject of numerous books and films, including the novel by Robert Harris and a film (Enigma) staring Dougray Scott and Kate Winslet.

One of the main reasons I went to Bletchley Park was to see the rebuilt Colossus Mk2. The story behind the design and construction of this machine is an inspiration.

GCCS had not only successfully broke the code of the German Enigma machine, but also highly automated the key and message decoding through the construction of a machine call the Bombe, designed by Alan Turing. Hitler, though, wanted a higher encryption capability for signals to his high command and his scientific team came up with a teleprinter based system developed by the Lorenz company.

The Lorenz machine worked with a 32 symbol baudot code system, messages were then encrypted with a sequence of obscuring characters using modulo 2 addition (exclusive NOR in boolean terms). If the obscuring characters were truly random, the cipher would of been near on impossible to break at that time, but the Lorenz machine used a series of mechanical rotors to generate a pseudo-random key. The breakthrough was when a 4000 character message was being sent in the German High Command, the receiver did not fully get the message and asked the sender to repeat it. The radio operator committed the cardinal sin and sent the message again with the same Lorenz settings.

Brigadier John Tiltman and the Cambridge graduate Bill Tutte exploited mistakes made by German radio operators and began to reconstruct the pseudo-random sequence and discover how the Lorenz encoding machine worked. The Lorenz encoding was cracked, the problem was by long hand it look weeks to decode a message, far too long, so an automated and substantially quicker method was needed.

The Post Office Research Labs at Dollis Hill produced a machine based on relays that could read punch tape, but even this took six weeks to crack the average message - still too long. One of the mathematicians working at Bletchley, Max Newman, worked out that using electronic logic circuits working in parallel, the messages could be broken quicker. Max approached one of the Post Office engineers Tommy Flowers to design a build a electronic machine to process the Lorenz messages. The first attempt was called the Heath Robinson, it proved both Max's theory and the electronic circuit design were correct - the problem was it's reliability.

Tommy's new design was based on using around a 1000 valves. None of his management believed it was feasible and told him to abandon the project. Luckily, Tommy ignored the doubters, and he and his team worked shifts round the clock to design and build Colossus in less than 9 months! Colossus went operational in January 1944. The machine was a success and Colossus was decoding Lorenz messages at a rate of 5000 characters per second. The Mk2 quickly followed which used around 2500 valves and was substantially quicker than the Mk1. In total ten machines were built and delivered to Bletchley. Through 1944 and 45 they worked around the clock decoding German messages. The success of D-Day was, in part, down to the Colossus, as German High Command messages decoded assured the Allies that the D-Day diversion plans had been believed by Hitler, without these decoded intercepts, the Allies would not of had the confidence that the Germans had taken the D-Day diversion bait.

In computing terms, Colossus can be regarded as a programmable special purpose computer. AND, OR and XOR logic gates could be configured in numerous combinations with a plug board system. Colossus had a 5-bit shift register, the first computing machine to use such an electronic circuit. The diagram below is an original schematic showing the architecture of Colossus


At the end of the Second World War Churchill ordered that 8 of the 10 Colossus's were to be completely destroyed, along with all schematics and technical documentation. Two survived and were taken on to, what is now, GCHQ in Cheltenham. These two machines were destroyed in the early 1960s.

The existence of Colossus remained secret until the 1970s when small snippets of information about the machine began to emerge. Ironically, most of the information about Colossus was released by the US Government under the Freedom of Information Act. Obviously, as Allies during the war, the US had knowledge and access to Colossus and a number of US service personnel were seconded to Bletchley Park.

In the early 90s a computing enthusiast Tony Sale, who was part of the group that helped save Bletchley Park from sure destruction, had the dream of rebuilding Colossus. I say again, rebuilding, not a replica! A number of the original team were still around, including Tommy Flowers. Luckily, they kept scraps of information about the original machine. The rebuild has taken over 15 years, but the machine is now up to a standard that it can decode Lorenz transmissions to the same speed and standard as the original. The BBC News clip below records an event in 2007 when Bletchley Park Trust held a competition to see whether anyone could beat Colossus on decoding an message encrypted by an original Lorenz machine.





Although comparison with modern 'general purpose' computers cannot be made directly, a scaled CPU clock speed for Colossus has been calculated at around 5.8MHz Pentium, not bad for a 65 year old computer! Seeing the machine 'in the flesh' it looks impressive.

So, back to the post title, what has Colossus got to do with Agile systems development? The team that designed and built Colossus, in my view, exhibited all the traits of a well performing systems team. They worked rapidly and iteratively, there was no 'big up-front requirements'. More importantly, it was the technical innovation, skill and persistence of Tommy Flowers and his team that won the day. Knowing how long typical IT projects take to get off the ground then deployed to production, it's absolutely amazing to think that this machine was designed and built in 9 months.

I highly recommend a visit to Bletchley Park. The pioneering work carried at Bletchley during the Second World War by the likes of Alan Turing, Max Newman and Tommy Flowers gave rise to the industry I work in today.

If you want further information on Colossus and Bletchley Park I'd recommend:

Friday, 27 February 2009

RESTful Web Services - SOA Made Simple?

In the SOA community there has been ongoing debate over SOAP versus REST for a number of years now. I've decided to add my 'pennies worth' to the argument. I feel I can add a unique perspective on this subject, as during 1998/99 I was heavily involved in the early Beta-testing of Microsoft Biztalk and had access to a lot of the formative thinking that eventually went into defining SOAP and the WS-* standards.

To really put the debate in perspective you need to understand the roots and history of the SOAP protocol and the people behind it.

The story starts back in 1998 when the XML 1.0 specification was agreed, IT vendors like Microsoft and IBM were beginning to get their heads around the idea of distributed computing using HTTP. Let's not forget, the 90's was all about the transition to client / server computing then n-tier architectures and middleware such as CORBA, J2EE, Microsoft Windows DNA (now .NET). At that time I was working with a start-up company that produced one of the first commercial COTS products based on Windows DNA. In my view, at that time, Sun Java J2EE lagged Microsoft DCOM MTS in defining n-tier architecture best practice.

The starting point for SOAP was XML-RPC. XML-RPC was pretty straightforward, it took the view that you could use the core HTTP mechanisms of GET, POST, PUT and DELETE to call a remote host API with XML as the payload. Everyone then understood that HTTP and XML was going to become the heterogeneous holy grail that we needed to promote open interoperable distributed computing.

A typical XML-RPC fragment looks something like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
POST /myservice HTTP/1.0
User-Agent: Frontier/5.1.2 (WinNT)
Host: myhost.com
Content-Type: text/xml
Content-length: 181
 
<?xml version="1.0"?>
  <methodCall>
    <methodName>myService.getMethodName</methodName>
    <params>
      <param>
        <value><i4>41</i4></value>
      </param>
    </params>
  </methodCall>

And a response that looks something like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
HTTP/1.1 200 OK
Connection: close
Content-Length: 158
Content-Type: text/xml
Date: Fri, 17 Jul 1998 19:55:08 GMT
Server: UserLand Frontier/5.1.2-WinNT
 
<?xml version="1.0"?>
  <methodResponse>
    <params>
      <param>
        <value><string>Returned Value</string></value>
      </param>
    </params>
  </methodResponse>

Pretty simple.

In 2000 I worked on a Touch-Screen Kiosk system that brought content from different web systems together in a client portal. We hooked into services such as Multimap and UpMyStreet all using an XML-RPC approach and it worked great.

Of course there are issues with this. How do you discover what services are available? How can a client understand what the description of the API is, data types etc? How do you secure an API on a public network? How do handle transactions?

These issues kicked started the vision for SOAP and WS-* standards. It may be a surprise to some, but the key organisation behind SOAP in the early days was Microsoft, that bastion of interoperability and open standards. The original contributors to SOAP worked on the Microsoft DCOM MTS team. It is, therefore, not surprising that the original SOAP proposal assumed a COM type system underneath the hood. DCOM supported HTTP as well as TCP/IP as a network transport. There was also an internal battle going on between the DCOM based SOAP and XML data teams within Microsoft as to how these two technologies should interact. This delayed the ratification of SOAP 1.0 until late 1999.

Knowing the background of the people who drove the early days of SOAP, it should be no surprise that SOAP is effectively DCOM for XML over HTTP. The architectural principles behind SOAP can be found in all the RPC middleware solutions of the 90s, i.e. Windows DNA, CORBA and J2EE. Also, with key vendors such as Microsoft and IBM driving the SOAP and WS-* standards, I suspect their motives behind architectural decisions were partly driven by commercial opportunities for selling products such as development tools, application servers etc.

Around this time, Roy Fielding, co-founder of the Apache project, was working on a dissertation for his PhD at University of California. Roy Fielding was one of the pioneers behind HTTP and HTML. Roy's dissertation, Architectural Styles and the Design of Network-based Software Architectures, probably didn't register any further than a few interested academic circles at the time, but now is regarded as the definitive guide to the architecture of the Web. In some ways, Roy's dissertation was a way to try and put some solid theory behind the evolution of technologies that became the Web. Sort of a retrospective design document if you like.

REST, as Roy articulates in his thesis, is an architectural style, not a protocol. Systems that are 'RESTful' adhere to the key architecture principles of REST. Unlike SOAP, REST is not a standard.

The key principles of REST are:
  • Application and State are abstracted into Resources
  • Every resource is uniquely addressable
  • Client Server
  • Stateless
  • Cacheable
  • Layered
Fielding also laid out the key goals of REST:
  • Scaleability of component interactions
  • Generality of interfaces
  • Independent deployment of components
  • Intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems
In simple terms, the SOAP vs. REST debate is a false one. You cannot compare SOAP and REST, they are completely different entities. SOAP is a standards based RPC middleware protocol, REST is an architectural style guide.

Fast forward to the present, 10 years since the release of SOAP v1.0, and where are we now with SOAP, REST and web services? Well for me a good indication of where web services are heading is to look to the major Internet and eCommerce players, I'm talking the Google, Yahoo, eBay and the Amazons of this world, and how they are providing access to their services.

It's interesting that most of these big players offer both SOAP and RESTful APIs to their services, yet reports have shown that the majority of developers prefer accessing the RESTful APIs over SOAP. Why is this? Simple, like electricity finding an earth, developers will always take the path of least resistance when developing any app. In essence, RESTful web services have proven to be substantially easier to use in the field than SOAP. This is not surprising either. If you read through the SOAP and WS-* specifications they are, in my view, near on unintelligible, and full of inconsistencies. If you want a light hearted criticism of SOAP, see this article by Pete Lacy. It's also interesting that the likes of Google who started providing both SOAP and RESTful access to their services have, in a lot of cases, deprecated their SOAP access.

So where do I stand on the SOAP vs. REST debate? The shear success of HTTP, therefore the principles of REST, coupled with its inherent simplicity, places me in the RESTful camp. Building SOA and web services on RESTful principles means you inherit all the proven advantages of HTTP. Re-read Fielding's goals for REST, these are entirely compatible with the goals of SOA.

It's my view that SOAP is 'tainted' with the client / server RPC thinking of the 1990's grafted on to HTTP. Looking at SOAP's history and the people who drove it's initial specification, you can understand why this could be the case.

Tim Berners-Lee's vision of the Web was simple:

"The Web is simply defined as the universe of global network-accessible information"

Berners-Lee view was that the primary purpose of the Web is to establish a globally shared information space, and 'legacy systems' could participate by publishing objects and services into this space. SOAP, on the other hand, talks about the web as a network transport, endpoints and bindings.

Taking Berners-Lee's core vision, SOAP, for me, adds an unnecessary level of indirection between the consumer and provider of information. In my view, the HTTP specification, used in conjunction with XML, already provides a robust mechanism for SOA across TCP/IP networks, what's the value of adding further complexity on top?

SOAP advocates will point to the issues of service discovery, security and transactions, I raised earlier, and argue this is what SOAP provides in conjunction with the WS- standards. I would argue that the proliferation of WS-* standards came about as sticking plasters to SOAP to make the Web Services behave more like traditional client / server RPC technologies.

It almost feels to me like SOAP and the WS- standards are 'fighting' against the purity and goals of HTTP, the primary protocol they run on.

Will REST win out? It's interesting to trawl the blogs and forums of the major vendors such as Oracle, Sun, IBM and Microsoft (coincidently the key vendors driving the web services standards) , and from around 2006 onwards you'll find an explosion of articles on REST. Yet go back further than that and it was all SOA equals Web Services equals SOAP.

Like a lot of software innovations these days, REST grew ground up from developers and Open Source communities, starts to become mainstream and the big players pick up on it. Ironically, considering REST is a style not a standard, you've now got the likes of Sun and w3C promoting standards around REST such as WADL

Even though I'm a fan of REST and Fielding's view of 'Resource Orientated' computing, I can still see a place for SOAP and WS-*. For example if you need to provide a service that requires transaction support, then SOAP's ideal. I've just got this feeling though, that the simplicity, purity, compatibility and scaleability of REST will win through in the end.

Time will tell!

If you're looking for a good technical overview of REST, I would recommend Elkstein's tutorial as a good starting point. I would also recommend listening to a podcast round table discussion of REST by a team from ThoughtWorks, including Martin Fowler. The postcast is split into two parts, the first part giving an overview and history of REST, the second going into more detail around modelling resources and deploying REST in the Enterprise.

Monday, 9 February 2009

Programming with Eclipse BIRT

A couple of years ago I started working on an Agent-Based Simulation System. This system simulated the behaviour of fleets of complex assets, such as aircraft, to support predictive analysis of maintenance, availability, system failure etc.

One of the (many) key challenges of the system architecture was the shear volume of data each simulation could potentially generate and the visualisation of the results. At the time the organisation I was working in hadn't decided on a corporate Business Intelligence / Analytics solution, so I had the task of looking for an appropriate technology to plug in to the system architecture.

It made sense that the solution was Open Source, looking around there were a few options including Jasper Reports, in the end I decided on Eclipse BIRT. I choose BIRT mainly because we were already using the Eclipse Platform for the rest of the system architecture development effort and, to be honest, I was quietly impressed with the capability, power and ease of use. Also BIRT has the advantage of being Open Source but with a solid commerical entity behind it in the form of Actuate.

BIRT's architecture is, essentially, split into two components, a Designer tool to maintain the report design, and a Java runtime Report Engine that manages and executes the report. The Designer component can either run as an Eclipse RCP client or Eclipse IDE Plug-In. The Report Engine will deploy to any standard Servlet Engine / J2ee Application Server environment, e.g. JBoss AS, Tomcat, Websphere etc.

An overview of the architecture is shown below.



The architecture of BIRT is pretty open and extensible. For example data source connectivity is provided by ODA which allows you to connect to different physical data sources, such as JDBC RDBMS, Web Services, Flat Files or Java Beans, yet treat them all as logic result sets. In fact, one of my first 'evaluations' of BIRT's capabilities involved building a web reporting front-end to Amazon's ECS (now known as AWS Associates Web Service).

For me, the real power of BIRT is the Server Event model that allows you alter the look, feel or behaviour of the report at runtime. An API is provided for the event model and can be accessed via Rhino (Javascript based scripting for Java) and native Java. An overview of the event model can be seen below.


Events are exposed on numerous report objects, for example DataSet.beforeOpen(), ReportElement.onPrepare() etc.

These events allow you to do all sort of 'clever stuff' at runtime. For example, change the SQL of a query based on a parameter or the outcome of a previous query, change repot or chart colours based on calculated values.

For example, the following code changes the SQL expression dependent on a runtime parameter rp_sortorder:
1
2
3
4
5
myDataSet.beforeOpen() {
  this.queryText=this.queryText.replace('sortlistorder',params["rp_sortorder"]);
}

You can also get access the the J2EE runtime container. For example, the following code gets access to a Http Session object.
1
2
3
4
5
6
7
8
report.initialize() {
  importPackage(Packages.javax.servlet.http);
  var thisSession=reportContext.getHttpServletRequest().getSession();
  var mySessionObject=thisSession.getAttribute("mySessionObject");
  var mySessionObjectValue=mySessionObject.getValue();
}

As well as a Report Design and Runtime (Engine) API, there's a Chart API that allows to extend the built in Chart functionality. For example, the code below alters the scale of a chart dependent on two global variables "ymindate" and "ymaxdate".
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
function beforeGeneration(chart, icsc) {
  importPackage(Packages.org.eclipse.birt.chart.model.data.impl);
  var scriptObj=icsc.getExternalContext().getScriptable();
  var yaxisminDate=scriptObj.getPersistentGlobalVariable("yminDate");
  var yaxismaxDate=scriptObj.getPersistentGlobalVariable("ymaxDate");
  var yaxismin=Date.parse(yaxisminDate);
  var yaxismax=Date.parse(yaxismaxDate);
  var xaxis=chart.getBaseAxes()[0];
  var yaxis=chart.getOrthogonalAxes(xaxis,true)[0];
  var yaxisScale=yaxis.getScale();
  yaxisScale.setMin(DateTimeDataElementImpl.create(yaxismin));
  yaxisScale.setMax(DateTimeDataElementImpl.create(yaxismax));
}

I am impressed with BIRT and I haven't yet found a reporting / analytics requirement it can't handle, coupled with it's extensibility and the backing of the Eclipse community and Actuate I highly recommend it.

If you want to find out more about BIRT:

Saturday, 31 January 2009

End of the line for the Relational Database?

Pretty much every 'business system' I've ever been involved in developing over the last 20 years has involved a RDBMS in some shape or form. I've worked with pretty much all the major vendors in my time, i.e. Oracle, Sybase, Ingres, MS SQL Server etc.

Developing my software career from the late 80's through the 90's, RDBMS's were always there, a key enabler for the move to Client / Server. Even when the Internet and HTTP came along, RDBMS's were still providing the backbone to applications.

Most architects and developers have grown up with relational algebra, normalisation and, of course, SQL. I remember the brief flirtation with the OO database revolution that never quite took off and it's still kicking around. I suspect architects just get used to the fact that data persistence is probably going to involve an RDBMS, what else would you use?

I believe the mantel of the RDBMS is beginning to be challenged with rise of a number of alternative database and persistence approaches built ground up with the Web and HTTP in mind. A lot of these new database engines share common principles and technologies, including using HTTP, REST, JSON and XML as the primary query tools, having flexible data models that are more document orientated than relational structured. All of these solutions take away the classic RDBMS problems of maintaining indexes, keys, relationships, allowing the developer to focus on the typical CRUD operations without worrying about how that data is structured, indexed or persisted.

Amazon opened up their e-Commerce services a few years ago now under the AWS banner. I've had an Amazon Developer account pretty much since the service was launched, mainly out of interest and experimentation than developing any real-world applications. Amazon have been steadily adding new services and finally added their database solution SimpleDB.

Currently in beta, SimpleDB provides a straightforward API to create domains, put, get and delete data and querying capabilities. Given the massive move away from SOAP to RESTful web services, I don't think it's any coincidence that Amazon have chosen the core HTTP verbs of get, put and delete for their SimpleDB API.

The data metaphor Amazon use for SimpleDB is the spreadsheet. Worksheets are akin to domains (RDBMS tables), items are rows, values are cells (single column value in a RDBMS table). The big difference is whereas a spreadsheet cell and RDBMS row/column intersect can only contain one value, a SimpleDB can contain many values. For an example take a look at the Product Catalogue domain below:


In this example Sweatpants have Color values of Blue, Yellow and Pink.

SimpleDB provides two query mechanisms, a SQL like Select expression, and a predicte type approach with Query expressions. Access is provided by either a SOAP or RESTful interface. For example, a RESTful call to add an Item called item123 to the domain 'mydomain' looks like:
https://sdb.amazonaws.com/?Action=PutAttributes
&DomainName=MyDomain
&ItemName=Item123
&Attribute.1.Name=Color&Attribute.1.Value=Blue
&Attribute.2.Name=Size&Attribute.2.Value=Med
&Attribute.3.Name=Price&Attribute.3.Value=0014.99
&AWSAccessKeyId=<valid_access_key>
&Version=2007-11-07
&Signature=Dqlp3Sd6ljTUA9Uf6SGtEExwUQE=
&SignatureVersion=2
&SignatureMethod=HmacSHA256
&Timestamp=2007-06-25T15%3A01%3A28-07%3A00

The XML response returned:
1
2
3
4
5
6
7
8
9
<PutAttributesResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07">
  <ResponseMetadata>
    <StatusCode>Success</StatusCode>
    <RequestId>f6820318-9658-4a9d-89f8-b067c90904fc</RequestId>
    <BoxUsage>0.0000219907</BoxUsage>
  </ResponseMetadata>
</PutAttributesResponse>
In terms of out right performance, sat out there in the 'Cloud' SimpleDB isn't going to be able to complete with an instance of an RDBMS sat a switch away from your App Server, let alone a product like Oracle Coherence. What SimpleDB does offer through, is a quick and cost effective way of building flexible data driven applications in the 'Cloud' without worrying about hosting, DBA maintenance etc.

SimpleDB is getting attention through Amazon's presence and branding, but there are a number of alternatives.

Dabble DB
goes one step further than SimpleDB and not only provides a database, but adds forms allowing users to build quite flexible data driven web apps. You still use Dabble as a database back-end to your own application tier through a Javascript and JSON API. Dabble is ideally architected for AJAX applications running from the Browser. An example query to Dabble from JavaScript is shown below.
1
2
3
4
5
6
7
8
9
10
11
12
13
Dabble.addView({
_class: 'View',
id: 'e63a411d-7cbb-4399-9b65-37cfee8546e3',
name: 'Authors',
fields: [88],
entries: [
{_name: 'Homer', _id: 45, country: 'Greece'},
{_name: 'Margaret Atwood', _id: 95, country: 'Canada'},
{_name: 'James Joyce', _id: 44, country: 'Ireland'}
]
});
Effectively, Dabble DB is Microsoft Access for the Web.

Not all of these new database engines run solely in the Cloud. Apache have the CouchDB project currently in incubator. CouchDB is interesting for a number of reasons. Not only does it support an adaptive document centric database with a RESTful JSON API, but it's developed in Erlang, rather than C / C++ or Java.

An overview of CouchDB's architecture can be seen below:



CouchDB is document centric, schema free with a flat address space. Documents are comprised of fields that can contain strings, numbers, dates or more complicated structures such as ordered lists and associative maps. An example document for a blog post could look like:
1
2
3
4
5
6
7
"Subject": "I like Plankton"
"Author": "Rusty"
"PostedDate": "5/23/2006"
"Tags": ["plankton", "baseball", "decisions"]
"Body": "I decided today that I don't like baseball. I like plankton."

To put structure over what, essentially, is an unstructured store, CouchDB provides support for views which are written in JavaScript. A simple view construct is shown below:
1
2
3
4
5
6
7
function(doc) {
  if (doc.Type == "customer") {
    emit(null, {LastName: doc.LastName, FirstName: doc.FirstName, Address: doc.Address});
  }
}

This view function creates a row for every document in the database that is of a Type 'customer', returning fields LastName, FirstName and Address. This view applies a key of 'null', there it therefore can't be referenced or sorted. An indexed and sortable view would look like:
1
2
3
4
5
6
7
8
function(doc) {
  if (doc.Type == "customer") {
    emit(doc.LastName, {FirstName: doc.FirstName, Address: doc.Address});
    emit(doc.FirstName, {LastName: doc.LastName, Address: doc.Address});
  }
}

And would return a JSON result that would look like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
{
  "total_rows":4,
  "offset":0,
  "rows":
  [
   {
     "id":"64ACF01B05F53ACFEC48C062A5D01D89",
     "key":"Katz",
     "value":{"FirstName":"Damien", "Address":"2407 Sawyer drive, Charlotte NC"}
   },
   {
     "id":"64ACF01B05F53ACFEC48C062A5D01D89",
     "key":"Damien",
     "value":{"LastName":"Katz", "Address":"2407 Sawyer drive, Charlotte NC"}
   },
   {
     "id":"5D01D8964ACF01B05F53ACFEC48C062A",
     "key":"Kerr",
     "value":{"FirstName":"Wayne", "Address":"123 Fake st., such and such"}
   },
   {
     "id":"5D01D8964ACF01B05F53ACFEC48C062A",
     "key":"Wayne",
     "value":{"LastName":"Kerr", "Address":"123 Fake st., such and such"}
   },
  ]
}

The choice of the Erlang VM runtime for CouchDB is also interesting. Erlang was developed by Ericsson as a platform for real-time Telecom systems. Erlang's support for lightweight threads, concurrency and all inter-process communications via messaging, is a highly scalable, distributed and fault-tolerant environment. Much more so than any current Java VM. This should make CouchDB perform very well.

CouchDB is stateless and is accessed entirely by HTTP, essentially following REST principles. This means CouchDB supports caching through proxies and edge server devices without modification.

Even though CouchDB is still an Apache incubator, there are some real-world apps built on it out there already. An interesting example is Ajatus, a sort of 'reverse CRM' solution.

Of course, no article on next-gen databases would be complete without mentioning the biggest one of them all - Google's Bigtable. Essentially, Bigtable is based on a huge sparse distributed hash map. Going into Bigtable in detail is well beyond this article, there's a publication available from Google here.

So is this really the end for the RDBMS? I suspect not just yet. There are hundreds of thousands of organisations and enterprises out there running their critical apps on Oracle, SQL Server, not forgetting the ubiquitous LAMP environments, typically with MySQL back-ends.

Even so, I believe these 'new generation' databases offer opportunities to build highly scalable, fault-tolerant and distributed applications with adaptable data models that inherently support the architecture of the web. With the likes of Amazon and Google heavily promoting these technologies, I personally would be worrying if I was in the database division of Oracle or Microsoft.

Thursday, 15 January 2009

Facts & Fallacies of Software Engineering

Most software developers know how systems really get built , and most will of come across organisations repeating the same old mistakes time amd time again. And of course, there are those myths that perpetuate the industry, such as all developers are equal in output and productivity.

Robert Glass's Facts and Fallacies of Software Engineering lays out these 'home-truths' and 'urban myths' of the systems development process. The book draws upon Roberts pretty unrivalled experience in the software field, dating back to the pioneering 1950's. There can't be too many people still active in the industry with such an eminent and long career.

I feel an affinity with Robert's career, as he explains in his introduction to Chapter 1 (About Management) how he shunned career prospects in management to stay true to the technologist path. I too flirted with the vision of aiming for Senior Management positions in my early 30's, starting the ubiquitous MBA route to bolster my prospects. I tired of the MBA in the end, deducing that (i) most management theory was just plain common sense dressed in Consultant speak and (ii) you could pick up the same knowledge just by reading a few well chosen management books and save yourself a shed load of cash in the process.

So back to the book. Robert lays out 55 facts and fallacies across areas including management, the life cycle and quality. Pretty much all of then I recognise and agree with. There are a couple of odd-ball / controversial ones, COBOL is a very bad language, but all the others are so much worse for example.

The book simply presents these facts and fallacies grouped by domain and subject, provides rationale and examples of them and supports their credibility through referencing other work. It can be a bit dry to read front to back, but the text's really meant for dipping in and out of when you're looking for that inspiration to solve your project's issues.

The key facts and fallacies for me include:
  • The most important factor in software work is the quality of the programmers - it never ceases to amaze me how often this is never recognised. I have seen so many projects where developers, analysts and architects are seen as 'fully interchangeable' by management. I have seen lead architects swapped on programmes just before major go-live milestones! Management need to recognise that the knowledge, skill and experience of the technical team at the coal face of delivery are the greatest influence in whether a project is successful or not.
  • Adding people to a late project only makes it later - when projects overrun there's always a temptation to 'throw' more resource at them. This invariably just makes the situation worse with more communication paths between team members and massively reduced productivity of your key technical staff as they spend time getting 'newbies' up to speed. Also, I believe no matter how complex the architecture of a system there is a limit in terms of team size to productivity. As teams grow not only do you have the learning curve and communication problems, but the more likely you're going to get team members who just don't get on with each other. I've also observed that in the panic to accelerate progress, the recruitment process can fall down with less experienced and skilled people being brought on board.
  • Estimation usually occurs at the wrong time by the wrong people - when an new initiative is agreed it's usually given to a project manager who's possibly never delivered a project like it before and is may be non-technical. Yet senior management will usually demand a schedule and budget forecast possibly years ahead and then hold the project manager to that schedule. Managers are usually reluctant to provide revised estimates as the project progresses to senior stakeholders through fear of losing credibility.
  • For every 25 percent increase in problem complexity, there is a 100 percent increase in solution complexity - this is one of least understood of Roberts facts, even amongst technical people. As a solution evolves and the business need is better understood by users and the delivery team, system features that early on in the life cycle appeared straight forward, suddenly start getting complex from a design and implementation viewpoint. Add on top of this the inevitable change in features and system behaviour that occurs as the project matures then the team can suddenly hit a wall of rapidly expanding system complexity. If not contained it can quite easily de-rail the delivery. Stakeholders often get frustrated when asking for, what they see, as simple feature requests. when the delivery team explains they can't be done without blowing the schedule or budget.
  • One of the most common causes of runaway projects is unstable requirements - see my article on Forget Requirements - Collaborate on a Solution Concept for a viewpoint on this one.
  • Software needs more methodologies - I have to admit to detesting most 'methodologies', by these I mean the likes of RUP, PRINCE2, DSDM etc. The content is usually valid, for example RUP contains loads of good practice guidelines on use cases, OOAD etc. It's just that they (i) tend to be seen as magic bullets and are over promoted as the saviour to all your problems by Vendors and Consultants, (ii) are usually implemented prescriptively with a one size fits all approach, and (iii) end up just massively increasing the bureaucracy that was probably already present in your organisation - only now it's got a name!
So what can an organisation learn from this book:
  • The 'coal face' technical people are the most important factor in delivery success, their knowledge, experience and skills
  • Move away from large long term, waterfall driven IT programmes with widely optimistic schedules and budgets, to incremental, iterative solution development delivering smaller capabilities but significantly quicker
  • Manage stakeholder expectations on what can and cannot be realistically achieved with available technologies
  • Forget methods, and tools for that matter, even when well implemented these only deliver marginal improvements over the technical experience, skills and capabilities of your people.


The only other comment I'd add about this book is that I still haven't fathomed out why there's a picture of a Snowy Owl on the front. I must email Robert Glass and ask him.

Wednesday, 14 January 2009

Windows 7 - Begining of the End for Microsoft?

Let me put this straight from the start, I am not one of these dedicated anti-Microsoft types with a pathological hatred of anything coming out of the Redmond stable. I spent the majority of my career in the 90's working with the Microsoft platform from DOS 3.0 to Windows NT, GWBASIC to Visual Studio. In my view, Microsoft's dominance of the Desktop and certain elements of Enterprise computing (file, print serving and mail for example) was more down to the poor vision, strategy and business models of its competitors than any MS 'evil planning'.

So Windows 7 is here and available for public download, and judging by the tech news feeds is in demand. Windows 7 needs to be good, even Microsoft admit they cocked-up on Vista. For me though, there's always an inherent problem with a software product line that's been out, what seems like, for ever. It inevitably turns into 'bloatware', and that's what happened to Vista. The problem with an OS is just how many more truly useful features can you add to an operating system? The features MS are touting for 7 seem pretty desperate to me, most of it centring around UI enhancements.

The problem for Microsoft is that user applications are rapidly moving web side, or to the Cloud to use the current buzzword. This website is a good example. The whole content is being managed in a browser, no native OS dependences, I can maintain this Blog from anything from an iPhone over 3G to a Linux Netbook on a Wi-Fi HotSpot in Starbucks. Vista sales were poor, and I believe Windows 7 sales will fall well below Microsoft's expectations for a number of reasons:
  • Broadband speeds are set to (hopefully) rapidly increase towards the end of 2009 here in the UK with BT's implementation of 21CN and ADSL2+ enabling more content to be streamed down to the browser.
  • Web based applications are beginning to become more mainstream with consumers championed by the likes of Google Apps
  • Users are trusting more of their data to the web with Online Backup solutions such as Carbonite and Web 2.0 applications such as Flickr
  • Browsers are heading to become a 'mini OS' in their own right, and are likely to become more robust development platforms - witness Google Chrome with OS like features such as multi-process and threading for rendering, Javascript execution, HTTP download etc.
  • Web-based apps are set to improve dramatically with the take up of rich AJAX development environments such as ExtJs
  • JavaScript is no longer a 'scripting language' for occasional web master tinkering, but rapidly becoming a serious language for application development, supporting OO 'like' concepts through prototype functions.
  • Browser vendors will focus on increased Javascript performance as this article shows with Firefox 3.1
  • Linux on the desktop is likely to become more mainstream with the rapid growth in NetBook sales, most of which are powered by lightweight Linux Distros such as Linpus and Xubuntu
In my view, the question won't be 'are you a Windows / Linux / OS X user?', it's more likely to be 'are you a IE / FireFox / Opera / Chrome / Safari user?'.

This change will be slow, but I believe will start to increase through 2009/10 in the consumer desktop space first. Corporates are inherently risk adverse and generally slow to change, particularly something as critical as their desktop infrastructure. Even so, with the current economic downturn, companies will start the question the value of paying massive licence fees and, I suspect, begin to embrace Open Source, starting in the data centre first. Once this transformation is complete in the data centre, Enterprises will surely look to desktops next.

So what next for Microsoft? To be fair Microsoft is not the two product company (Office and Windows) it once was. It's revenues are spread across its Client, Server & Tools, Business, On-line and Entertainment divisions. With XBox it showed how it could take on an entrenched incumbent like Sony and win.

Even though more computing tasks are likely to head to the 'Cloud', there still are a number of application domains that will always require heavyweight local processing and file management, 3D modelling & rendering, video, graphics and, of course, gaming to name a few.

Ray Ozzie, Microsoft's Chief Software Architect and Bill Gates successor, has put his faith in the Azure Platform in an attempt to get Microsoft dominant in the Cloud Computing space. I personally like the Azure concept. I suspect, though, the biggest challenge to large scale uptake will be in Governments and Corporates concerns over security.

Then there's the small issue of the apps themselves, most Enterprises internal business processes run a mixture of home-grown apps and COTS such as ERP and CRM. You'd have to question the benefits, let alone the feasibility and shear effort and cost, of moving these to the Azure platform.

Microsoft's main hope may be in persuading major app vendors such as SAP to port versions of their solutions to Azure. These vendors, though, tend to have their own SOA and SaaS strategies and Azure will probably not seem attractive to them. Obviously the Oracle's of this world won't be interested in supporting Azure, it just takes too much of their portfolio away

Azure's definitely a gamble for Microsoft.

So what about Windows future. I feel that Microsoft will begin to seriously feel the heat from Linux and possibly offer a number of very low cost (may be even free) Windows variants targeted at low spec 'Internet' PCs and NetBooks. Target revenues from those users who do need desktop power, focusing on features to improve local processing of graphics and video.

You never know, we may yet see an Microsoft 'Open Source' free Windows available for download soon!

Monday, 12 January 2009

Forget Requirements - Collaborate on a Solution Concept

Requirements in systems development have always been a difficult area. In the Standish Group Chaos Report issues with requirements always appear in the top 3 entries of reasons for project failure.

With this in mind there tends to be a management emphasis on "getting the requirements right", before committing to any form of development or implementation. Yet I've experienced numerous projects where hundreds, if not thousands, of man hours have been devoted to requirements, and still solutions have not met expectations. I suspect anyone reading this has also experienced similar projects. So why is requirements management so often poorly executed?

You often hear people talk about traceability, configuration & change control, use cases, process models etc, etc. Management will throw Process Improvement, Quality Teams and frameworks such as CMMI at the problem.

For me there are some 'home truths' about requirements which make the task. if tackled in the 'traditional way', near on impossible:
  • The majority of IT programmes are driven 'top down' with very scant definition of what's required, usually some vague goals - if you're lucky.
  • Stakeholders that will actually have to use the system are often not engaged until the end of the life cycle - if at all.
  • Stakeholders and sponsors usually change during the project life cycle, along with their expectations and, therefore, the requirements.
  • Users cannot often express their needs in terms that can be easily translated into system specification
  • Management and users usually have no understanding of the constraints or capabilities of the technologies. They ask for features that are infeasible or uneconomic to implement or, at the other end of the extreme, they don't ask for features which would be simple to deliver because they don't realise they can
  • Management ask for 'signed off' requirements documents, yet no-one ever reads them, let alone understands them.
  • Business processes, rules and taxonomy are 'fuzzy', ill-defined and not agreed upon by stakeholders
  • Stakeholders will keep changing their minds and usually come up with conflicting requirements
  • Users and management usually cannot see a business process working any different to how it works now, resulting in lost opportunities for IT driven improvement.
For a good example, I was working in an Investment Bank on an Asset Management system. I remember a workshop where we were trying to detail the business rules of a particular financial instrument. When we got to the real nitty gritty of how these rules worked, the guy who was the SME in this instrument said "...the system calculates all that". It turned out in the end that very few people in the business understood the detail as it had all been encoded in a Mainframe system that had been there longer than their time in the company! Cue the development team spending man months reverse engineering 1000's of lines of ADABAS code!

I could go on, but you get the idea. Basically the traditional approach encouraged by the Waterfall life cycle and heavyweight methods such as PRINCE2, SSADM and, to an certain extent RUP don't deliver the goods in the majority of projects.

I believe a big part of the problem is a requirement can end up being anything from a high level business objective, e.g. the system shall reduce the claim process time by n%, to a specific system requirement, e.g all buttons shall be blue, and variations in between. In theory the requirements analysis process should weed these issues out. But it rarely does due to the simple fact that requirements are being captured in, what I call, a 'solution architecture vacuum', i.e. they can't be validated against any form of system implementation view that sense checks their feasibility. This process can continue until your project is overflowing with requirement statements and process models and the whole project ends up in Analysis Paralysis.

What's the solution? Well, there's a lot of talk in the industry about Agile, in fact so much so it's become an industry in itself and possibly well on its way to becoming an oxymoron. I have seen very few organisations truly embrace an Agile approach, mainly due to management culture and vested interests, but that's another article.

In my view if organisations want to improve their approach to systems delivery then they really need to drop the idea of requirements management altogether, at least in the traditional sense of doorstop URDs, SRDs, Use Cases, endless Workshops and incomprehensible Process Models.

A fresh approach is required that is focused, not on requirements, but the solution, right at the start of the project life cycle. A overview of approach is shown below.


The approach is, of course, Agile, but adding the concept of an Increment or Micro-Increment on top of an Iteration. Increments should be measured in days, yet still deliver some demo-able or executable software to stakeholders. Micro-Increments are important as they drive projects to meet short term goals that are focused on software delivery, even if it's a simple as a dumb HTML UI mock-up, this adds infinitely more value that lines of requirements text or use cases.

Inputs to the Solution Concept include:
  • Available Technology Components - ensure you base your architecture on components and technologies you're confident you can readily develop and deploy. Look for maximum reuse, both in the small, e.g. Java persistence frameworks, and in the large, e.g. packaged COTS modules such as ERP and CRM
  • Application Architecture Patterns - very few business systems are entirely new, in all probability elements of the solution you're trying to build have been built and proven. Don't waste time reinventing wheels, leverage these patterns
  • Legacy Systems - this may be both systems that your solution will replace and systems you'll need to interface to or extract data from. It also includes manual systems, paper forms and any 'home grown' end user solutions, usually based upon desktop tools such as Excel and Access. Don't dismiss these by the way, I've repeatedly come across some pretty impressive solutions built by keen amateurs!
  • Business Goals & Objectives - understand what the business is trying to achieve and what a successful system looks like. The more you can immerse yourself in the users problem from their perspective, the better chance you have off building a great solution. More often than not, you uncover whole areas of requirements that users have not even thought about.
  • Programatics & Risk - ensure the budget and desired time scales are baked into the solution design at the start. There's no point designing a solution that's going to take 2 years when the stakeholders need something now! On the point of schedules, it's my view that if the solution is going to take longer that 9 months to go-live then you should either (i) reduce scope (ii) break the solution up into smaller elements or (iii) forget it! In my experience any information system that takes longer than 9 months to get deployed is likely to be pretty useless as the organisation will of moved on. The rule here is, the faster you can deploy solutions to production the better
Once you start to lay your hands on these inputs, the Increments themselves are all about getting stuff built! Yes you need to maintain some documentation, but keep it lightweight and value add.
  • UI Prototypes - use cases are okay, but there's no substitute for putting, at least what looks like, a real solution in front of stakeholders. In my experience UI prototypes validate system requirements better that any process modelling or workshops could ever do.
  • Demo-able Solution - if it's feasible to build some form of functional prototype within the bounds of an Iteration, then you should. Focus on the most complex or least understood area of the solution first.
  • Architecture Prototype - sense check your technology stack, runtime topology and non-functionals as early as you can. Often these issues constraint the functionality than can be implemented. For example, you may be able to do some fancy stuff in the Browser with a Plug-In, but the Corporate firewalls block the port it uses. You want to find these issues out right at the start of the project before you comit to the architecture.
  • Candidate Feature List - as you're producing these prototypes and getting feedback from stakeholders, you'll start to get 'real' useful requirements, that are in context of a system. I don't call these requirements, I call them system features as they are tied to the architecture. These features should be unequivocally understood by the development team how they work, what good looks like and potential approaches to implementation and test.
In my experience, once a project gets into a 'groove' of running Increments continuous throughout the life cycle of the delivery, then the whole process becomes self reinforcing through better defined features, improved prototypes etc. In fact, the process doesn't really change from inception to the final go-live, prototypes gradually move to alphas, betas, pre-release versions, release candidates then a final decision to promote a release candidate live.

So, stop talking about requirements, and start building Solution Concepts.

If you're looking for further useful info on this approach then I'd recommend:
  • Eclipse EPF - an Eclipse project focused on a Open Source lightweight development approach based on IBM's RUP but stripped to the core.
  • Feature Driven Development - promotes a project delivery approach called FDD based on features. There's also a book available on FDD.
  • Introduction to Features - definition from Scott Ambler as to what a good Feature looks like.
  • Agile Manifesto - and finally, keep this web page on your browser at all times to remind you what your job is!