Saturday, 31 January 2009

End of the line for the Relational Database?

Pretty much every 'business system' I've ever been involved in developing over the last 20 years has involved a RDBMS in some shape or form. I've worked with pretty much all the major vendors in my time, i.e. Oracle, Sybase, Ingres, MS SQL Server etc.

Developing my software career from the late 80's through the 90's, RDBMS's were always there, a key enabler for the move to Client / Server. Even when the Internet and HTTP came along, RDBMS's were still providing the backbone to applications.

Most architects and developers have grown up with relational algebra, normalisation and, of course, SQL. I remember the brief flirtation with the OO database revolution that never quite took off and it's still kicking around. I suspect architects just get used to the fact that data persistence is probably going to involve an RDBMS, what else would you use?

I believe the mantel of the RDBMS is beginning to be challenged with rise of a number of alternative database and persistence approaches built ground up with the Web and HTTP in mind. A lot of these new database engines share common principles and technologies, including using HTTP, REST, JSON and XML as the primary query tools, having flexible data models that are more document orientated than relational structured. All of these solutions take away the classic RDBMS problems of maintaining indexes, keys, relationships, allowing the developer to focus on the typical CRUD operations without worrying about how that data is structured, indexed or persisted.

Amazon opened up their e-Commerce services a few years ago now under the AWS banner. I've had an Amazon Developer account pretty much since the service was launched, mainly out of interest and experimentation than developing any real-world applications. Amazon have been steadily adding new services and finally added their database solution SimpleDB.

Currently in beta, SimpleDB provides a straightforward API to create domains, put, get and delete data and querying capabilities. Given the massive move away from SOAP to RESTful web services, I don't think it's any coincidence that Amazon have chosen the core HTTP verbs of get, put and delete for their SimpleDB API.

The data metaphor Amazon use for SimpleDB is the spreadsheet. Worksheets are akin to domains (RDBMS tables), items are rows, values are cells (single column value in a RDBMS table). The big difference is whereas a spreadsheet cell and RDBMS row/column intersect can only contain one value, a SimpleDB can contain many values. For an example take a look at the Product Catalogue domain below:


In this example Sweatpants have Color values of Blue, Yellow and Pink.

SimpleDB provides two query mechanisms, a SQL like Select expression, and a predicte type approach with Query expressions. Access is provided by either a SOAP or RESTful interface. For example, a RESTful call to add an Item called item123 to the domain 'mydomain' looks like:
https://sdb.amazonaws.com/?Action=PutAttributes
&DomainName=MyDomain
&ItemName=Item123
&Attribute.1.Name=Color&Attribute.1.Value=Blue
&Attribute.2.Name=Size&Attribute.2.Value=Med
&Attribute.3.Name=Price&Attribute.3.Value=0014.99
&AWSAccessKeyId=<valid_access_key>
&Version=2007-11-07
&Signature=Dqlp3Sd6ljTUA9Uf6SGtEExwUQE=
&SignatureVersion=2
&SignatureMethod=HmacSHA256
&Timestamp=2007-06-25T15%3A01%3A28-07%3A00

The XML response returned:

<PutAttributesResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07">
<ResponseMetadata>
<StatusCode>Success</StatusCode>
<RequestId>f6820318-9658-4a9d-89f8-b067c90904fc</RequestId>
<BoxUsage>0.0000219907</BoxUsage>
</ResponseMetadata>
</PutAttributesResponse>
In terms of out right performance, sat out there in the 'Cloud' SimpleDB isn't going to be able to complete with an instance of an RDBMS sat a switch away from your App Server, let alone a product like Oracle Coherence. What SimpleDB does offer through, is a quick and cost effective way of building flexible data driven applications in the 'Cloud' without worrying about hosting, DBA maintenance etc.

SimpleDB is getting attention through Amazon's presence and branding, but there are a number of alternatives.

Dabble DB
goes one step further than SimpleDB and not only provides a database, but adds forms allowing users to build quite flexible data driven web apps. You still use Dabble as a database back-end to your own application tier through a Javascript and JSON API. Dabble is ideally architected for AJAX applications running from the Browser. An example query to Dabble from JavaScript is shown below.

Dabble.addView({
_class: 'View',
id: 'e63a411d-7cbb-4399-9b65-37cfee8546e3',
name: 'Authors',
fields: [88],
entries: [
{_name: 'Homer', _id: 45, country: 'Greece'},
{_name: 'Margaret Atwood', _id: 95, country: 'Canada'},
{_name: 'James Joyce', _id: 44, country: 'Ireland'}
]
});
Effectively, Dabble DB is Microsoft Access for the Web.

Not all of these new database engines run solely in the Cloud. Apache have the CouchDB project currently in incubator. CouchDB is interesting for a number of reasons. Not only does it support an adaptive document centric database with a RESTful JSON API, but it's developed in Erlang, rather than C / C++ or Java.

An overview of CouchDB's architecture can be seen below:



CouchDB is document centric, schema free with a flat address space. Documents are comprised of fields that can contain strings, numbers, dates or more complicated structures such as ordered lists and associative maps. An example document for a blog post could look like:

"Subject": "I like Plankton"
"Author": "Rusty"
"PostedDate": "5/23/2006"
"Tags": ["plankton", "baseball", "decisions"]
"Body": "I decided today that I don't like baseball. I like plankton."

To put structure over what, essentially, is an unstructured store, CouchDB provides support for views which are written in JavaScript. A simple view construct is shown below:

function(doc) {
if (doc.Type == "customer") {
emit(null, {LastName: doc.LastName, FirstName: doc.FirstName, Address: doc.Address});
}
}

This view function creates a row for every document in the database that is of a Type 'customer', returning fields LastName, FirstName and Address. This view applies a key of 'null', there it therefore can't be referenced or sorted. An indexed and sortable view would look like:

function(doc) {
if (doc.Type == "customer") {
emit(doc.LastName, {FirstName: doc.FirstName, Address: doc.Address});
emit(doc.FirstName, {LastName: doc.LastName, Address: doc.Address});
}
}

And would return a JSON result that would look like:

{
"total_rows":4,
"offset":0,
"rows":
[
{
"id":"64ACF01B05F53ACFEC48C062A5D01D89",
"key":"Katz",
"value":{"FirstName":"Damien", "Address":"2407 Sawyer drive, Charlotte NC"}
},
{
"id":"64ACF01B05F53ACFEC48C062A5D01D89",
"key":"Damien",
"value":{"LastName":"Katz", "Address":"2407 Sawyer drive, Charlotte NC"}
},
{
"id":"5D01D8964ACF01B05F53ACFEC48C062A",
"key":"Kerr",
"value":{"FirstName":"Wayne", "Address":"123 Fake st., such and such"}
},
{
"id":"5D01D8964ACF01B05F53ACFEC48C062A",
"key":"Wayne",
"value":{"LastName":"Kerr", "Address":"123 Fake st., such and such"}
},
]
}

The choice of the Erlang VM runtime for CouchDB is also interesting. Erlang was developed by Ericsson as a platform for real-time Telecom systems. Erlang's support for lightweight threads, concurrency and all inter-process communications via messaging, is a highly scalable, distributed and fault-tolerant environment. Much more so than any current Java VM. This should make CouchDB perform very well.

CouchDB is stateless and is accessed entirely by HTTP, essentially following REST principles. This means CouchDB supports caching through proxies and edge server devices without modification.

Even though CouchDB is still an Apache incubator, there are some real-world apps built on it out there already. An interesting example is Ajatus, a sort of 'reverse CRM' solution.

Of course, no article on next-gen databases would be complete without mentioning the biggest one of them all - Google's Bigtable. Essentially, Bigtable is based on a huge sparse distributed hash map. Going into Bigtable in detail is well beyond this article, there's a publication available from Google here.

So is this really the end for the RDBMS? I suspect not just yet. There are hundreds of thousands of organisations and enterprises out there running their critical apps on Oracle, SQL Server, not forgetting the ubiquitous LAMP environments, typically with MySQL back-ends.

Even so, I believe these 'new generation' databases offer opportunities to build highly scalable, fault-tolerant and distributed applications with adaptable data models that inherently support the architecture of the web. With the likes of Amazon and Google heavily promoting these technologies, I personally would be worrying if I was in the database division of Oracle or Microsoft.

Thursday, 15 January 2009

Facts & Fallacies of Software Engineering

Most software developers know how systems really get built , and most will of come across organisations repeating the same old mistakes time amd time again. And of course, there are those myths that perpetuate the industry, such as all developers are equal in output and productivity.

Robert Glass's Facts and Fallacies of Software Engineering lays out these 'home-truths' and 'urban myths' of the systems development process. The book draws upon Roberts pretty unrivalled experience in the software field, dating back to the pioneering 1950's. There can't be too many people still active in the industry with such an eminent and long career.

I feel an affinity with Robert's career, as he explains in his introduction to Chapter 1 (About Management) how he shunned career prospects in management to stay true to the technologist path. I too flirted with the vision of aiming for Senior Management positions in my early 30's, starting the ubiquitous MBA route to bolster my prospects. I tired of the MBA in the end, deducing that (i) most management theory was just plain common sense dressed in Consultant speak and (ii) you could pick up the same knowledge just by reading a few well chosen management books and save yourself a shed load of cash in the process.

So back to the book. Robert lays out 55 facts and fallacies across areas including management, the life cycle and quality. Pretty much all of then I recognise and agree with. There are a couple of odd-ball / controversial ones, COBOL is a very bad language, but all the others are so much worse for example.

The book simply presents these facts and fallacies grouped by domain and subject, provides rationale and examples of them and supports their credibility through referencing other work. It can be a bit dry to read front to back, but the text's really meant for dipping in and out of when you're looking for that inspiration to solve your project's issues.

The key facts and fallacies for me include:
  • The most important factor in software work is the quality of the programmers - it never ceases to amaze me how often this is never recognised. I have seen so many projects where developers, analysts and architects are seen as 'fully interchangeable' by management. I have seen lead architects swapped on programmes just before major go-live milestones! Management need to recognise that the knowledge, skill and experience of the technical team at the coal face of delivery are the greatest influence in whether a project is successful or not.
  • Adding people to a late project only makes it later - when projects overrun there's always a temptation to 'throw' more resource at them. This invariably just makes the situation worse with more communication paths between team members and massively reduced productivity of your key technical staff as they spend time getting 'newbies' up to speed. Also, I believe no matter how complex the architecture of a system there is a limit in terms of team size to productivity. As teams grow not only do you have the learning curve and communication problems, but the more likely you're going to get team members who just don't get on with each other. I've also observed that in the panic to accelerate progress, the recruitment process can fall down with less experienced and skilled people being brought on board.
  • Estimation usually occurs at the wrong time by the wrong people - when an new initiative is agreed it's usually given to a project manager who's possibly never delivered a project like it before and is may be non-technical. Yet senior management will usually demand a schedule and budget forecast possibly years ahead and then hold the project manager to that schedule. Managers are usually reluctant to provide revised estimates as the project progresses to senior stakeholders through fear of losing credibility.
  • For every 25 percent increase in problem complexity, there is a 100 percent increase in solution complexity - this is one of least understood of Roberts facts, even amongst technical people. As a solution evolves and the business need is better understood by users and the delivery team, system features that early on in the life cycle appeared straight forward, suddenly start getting complex from a design and implementation viewpoint. Add on top of this the inevitable change in features and system behaviour that occurs as the project matures then the team can suddenly hit a wall of rapidly expanding system complexity. If not contained it can quite easily de-rail the delivery. Stakeholders often get frustrated when asking for, what they see, as simple feature requests. when the delivery team explains they can't be done without blowing the schedule or budget.
  • One of the most common causes of runaway projects is unstable requirements - see my article on Forget Requirements - Collaborate on a Solution Concept for a viewpoint on this one.
  • Software needs more methodologies - I have to admit to detesting most 'methodologies', by these I mean the likes of RUP, PRINCE2, DSDM etc. The content is usually valid, for example RUP contains loads of good practice guidelines on use cases, OOAD etc. It's just that they (i) tend to be seen as magic bullets and are over promoted as the saviour to all your problems by Vendors and Consultants, (ii) are usually implemented prescriptively with a one size fits all approach, and (iii) end up just massively increasing the bureaucracy that was probably already present in your organisation - only now it's got a name!
So what can an organisation learn from this book:
  • The 'coal face' technical people are the most important factor in delivery success, their knowledge, experience and skills
  • Move away from large long term, waterfall driven IT programmes with widely optimistic schedules and budgets, to incremental, iterative solution development delivering smaller capabilities but significantly quicker
  • Manage stakeholder expectations on what can and cannot be realistically achieved with available technologies
  • Forget methods, and tools for that matter, even when well implemented these only deliver marginal improvements over the technical experience, skills and capabilities of your people.


The only other comment I'd add about this book is that I still haven't fathomed out why there's a picture of a Snowy Owl on the front. I must email Robert Glass and ask him.

Wednesday, 14 January 2009

Windows 7 - Begining of the End for Microsoft?

Let me put this straight from the start, I am not one of these dedicated anti-Microsoft types with a pathological hatred of anything coming out of the Redmond stable. I spent the majority of my career in the 90's working with the Microsoft platform from DOS 3.0 to Windows NT, GWBASIC to Visual Studio. In my view, Microsoft's dominance of the Desktop and certain elements of Enterprise computing (file, print serving and mail for example) was more down to the poor vision, strategy and business models of its competitors than any MS 'evil planning'.

So Windows 7 is here and available for public download, and judging by the tech news feeds is in demand. Windows 7 needs to be good, even Microsoft admit they cocked-up on Vista. For me though, there's always an inherent problem with a software product line that's been out, what seems like, for ever. It inevitably turns into 'bloatware', and that's what happened to Vista. The problem with an OS is just how many more truly useful features can you add to an operating system? The features MS are touting for 7 seem pretty desperate to me, most of it centring around UI enhancements.

The problem for Microsoft is that user applications are rapidly moving web side, or to the Cloud to use the current buzzword. This website is a good example. The whole content is being managed in a browser, no native OS dependences, I can maintain this Blog from anything from an iPhone over 3G to a Linux Netbook on a Wi-Fi HotSpot in Starbucks. Vista sales were poor, and I believe Windows 7 sales will fall well below Microsoft's expectations for a number of reasons:
  • Broadband speeds are set to (hopefully) rapidly increase towards the end of 2009 here in the UK with BT's implementation of 21CN and ADSL2+ enabling more content to be streamed down to the browser.
  • Web based applications are beginning to become more mainstream with consumers championed by the likes of Google Apps
  • Users are trusting more of their data to the web with Online Backup solutions such as Carbonite and Web 2.0 applications such as Flickr
  • Browsers are heading to become a 'mini OS' in their own right, and are likely to become more robust development platforms - witness Google Chrome with OS like features such as multi-process and threading for rendering, Javascript execution, HTTP download etc.
  • Web-based apps are set to improve dramatically with the take up of rich AJAX development environments such as ExtJs
  • JavaScript is no longer a 'scripting language' for occasional web master tinkering, but rapidly becoming a serious language for application development, supporting OO 'like' concepts through prototype functions.
  • Browser vendors will focus on increased Javascript performance as this article shows with Firefox 3.1
  • Linux on the desktop is likely to become more mainstream with the rapid growth in NetBook sales, most of which are powered by lightweight Linux Distros such as Linpus and Xubuntu
In my view, the question won't be 'are you a Windows / Linux / OS X user?', it's more likely to be 'are you a IE / FireFox / Opera / Chrome / Safari user?'.

This change will be slow, but I believe will start to increase through 2009/10 in the consumer desktop space first. Corporates are inherently risk adverse and generally slow to change, particularly something as critical as their desktop infrastructure. Even so, with the current economic downturn, companies will start the question the value of paying massive licence fees and, I suspect, begin to embrace Open Source, starting in the data centre first. Once this transformation is complete in the data centre, Enterprises will surely look to desktops next.

So what next for Microsoft? To be fair Microsoft is not the two product company (Office and Windows) it once was. It's revenues are spread across its Client, Server & Tools, Business, On-line and Entertainment divisions. With XBox it showed how it could take on an entrenched incumbent like Sony and win.

Even though more computing tasks are likely to head to the 'Cloud', there still are a number of application domains that will always require heavyweight local processing and file management, 3D modelling & rendering, video, graphics and, of course, gaming to name a few.

Ray Ozzie, Microsoft's Chief Software Architect and Bill Gates successor, has put his faith in the Azure Platform in an attempt to get Microsoft dominant in the Cloud Computing space. I personally like the Azure concept. I suspect, though, the biggest challenge to large scale uptake will be in Governments and Corporates concerns over security.

Then there's the small issue of the apps themselves, most Enterprises internal business processes run a mixture of home-grown apps and COTS such as ERP and CRM. You'd have to question the benefits, let alone the feasibility and shear effort and cost, of moving these to the Azure platform.

Microsoft's main hope may be in persuading major app vendors such as SAP to port versions of their solutions to Azure. These vendors, though, tend to have their own SOA and SaaS strategies and Azure will probably not seem attractive to them. Obviously the Oracle's of this world won't be interested in supporting Azure, it just takes too much of their portfolio away

Azure's definitely a gamble for Microsoft.

So what about Windows future. I feel that Microsoft will begin to seriously feel the heat from Linux and possibly offer a number of very low cost (may be even free) Windows variants targeted at low spec 'Internet' PCs and NetBooks. Target revenues from those users who do need desktop power, focusing on features to improve local processing of graphics and video.

You never know, we may yet see an Microsoft 'Open Source' free Windows available for download soon!

Monday, 12 January 2009

Forget Requirements - Collaborate on a Solution Concept

Requirements in systems development have always been a difficult area. In the Standish Group Chaos Report issues with requirements always appear in the top 3 entries of reasons for project failure.

With this in mind there tends to be a management emphasis on "getting the requirements right", before committing to any form of development or implementation. Yet I've experienced numerous projects where hundreds, if not thousands, of man hours have been devoted to requirements, and still solutions have not met expectations. I suspect anyone reading this has also experienced similar projects. So why is requirements management so often poorly executed?

You often hear people talk about traceability, configuration & change control, use cases, process models etc, etc. Management will throw Process Improvement, Quality Teams and frameworks such as CMMI at the problem.

For me there are some 'home truths' about requirements which make the task. if tackled in the 'traditional way', near on impossible:
  • The majority of IT programmes are driven 'top down' with very scant definition of what's required, usually some vague goals - if you're lucky.
  • Stakeholders that will actually have to use the system are often not engaged until the end of the life cycle - if at all.
  • Stakeholders and sponsors usually change during the project life cycle, along with their expectations and, therefore, the requirements.
  • Users cannot often express their needs in terms that can be easily translated into system specification
  • Management and users usually have no understanding of the constraints or capabilities of the technologies. They ask for features that are infeasible or uneconomic to implement or, at the other end of the extreme, they don't ask for features which would be simple to deliver because they don't realise they can
  • Management ask for 'signed off' requirements documents, yet no-one ever reads them, let alone understands them.
  • Business processes, rules and taxonomy are 'fuzzy', ill-defined and not agreed upon by stakeholders
  • Stakeholders will keep changing their minds and usually come up with conflicting requirements
  • Users and management usually cannot see a business process working any different to how it works now, resulting in lost opportunities for IT driven improvement.
For a good example, I was working in an Investment Bank on an Asset Management system. I remember a workshop where we were trying to detail the business rules of a particular financial instrument. When we got to the real nitty gritty of how these rules worked, the guy who was the SME in this instrument said "...the system calculates all that". It turned out in the end that very few people in the business understood the detail as it had all been encoded in a Mainframe system that had been there longer than their time in the company! Cue the development team spending man months reverse engineering 1000's of lines of ADABAS code!

I could go on, but you get the idea. Basically the traditional approach encouraged by the Waterfall life cycle and heavyweight methods such as PRINCE2, SSADM and, to an certain extent RUP don't deliver the goods in the majority of projects.

I believe a big part of the problem is a requirement can end up being anything from a high level business objective, e.g. the system shall reduce the claim process time by n%, to a specific system requirement, e.g all buttons shall be blue, and variations in between. In theory the requirements analysis process should weed these issues out. But it rarely does due to the simple fact that requirements are being captured in, what I call, a 'solution architecture vacuum', i.e. they can't be validated against any form of system implementation view that sense checks their feasibility. This process can continue until your project is overflowing with requirement statements and process models and the whole project ends up in Analysis Paralysis.

What's the solution? Well, there's a lot of talk in the industry about Agile, in fact so much so it's become an industry in itself and possibly well on its way to becoming an oxymoron. I have seen very few organisations truly embrace an Agile approach, mainly due to management culture and vested interests, but that's another article.

In my view if organisations want to improve their approach to systems delivery then they really need to drop the idea of requirements management altogether, at least in the traditional sense of doorstop URDs, SRDs, Use Cases, endless Workshops and incomprehensible Process Models.

A fresh approach is required that is focused, not on requirements, but the solution, right at the start of the project life cycle. A overview of approach is shown below.


The approach is, of course, Agile, but adding the concept of an Increment or Micro-Increment on top of an Iteration. Increments should be measured in days, yet still deliver some demo-able or executable software to stakeholders. Micro-Increments are important as they drive projects to meet short term goals that are focused on software delivery, even if it's a simple as a dumb HTML UI mock-up, this adds infinitely more value that lines of requirements text or use cases.

Inputs to the Solution Concept include:
  • Available Technology Components - ensure you base your architecture on components and technologies you're confident you can readily develop and deploy. Look for maximum reuse, both in the small, e.g. Java persistence frameworks, and in the large, e.g. packaged COTS modules such as ERP and CRM
  • Application Architecture Patterns - very few business systems are entirely new, in all probability elements of the solution you're trying to build have been built and proven. Don't waste time reinventing wheels, leverage these patterns
  • Legacy Systems - this may be both systems that your solution will replace and systems you'll need to interface to or extract data from. It also includes manual systems, paper forms and any 'home grown' end user solutions, usually based upon desktop tools such as Excel and Access. Don't dismiss these by the way, I've repeatedly come across some pretty impressive solutions built by keen amateurs!
  • Business Goals & Objectives - understand what the business is trying to achieve and what a successful system looks like. The more you can immerse yourself in the users problem from their perspective, the better chance you have off building a great solution. More often than not, you uncover whole areas of requirements that users have not even thought about.
  • Programatics & Risk - ensure the budget and desired time scales are baked into the solution design at the start. There's no point designing a solution that's going to take 2 years when the stakeholders need something now! On the point of schedules, it's my view that if the solution is going to take longer that 9 months to go-live then you should either (i) reduce scope (ii) break the solution up into smaller elements or (iii) forget it! In my experience any information system that takes longer than 9 months to get deployed is likely to be pretty useless as the organisation will of moved on. The rule here is, the faster you can deploy solutions to production the better
Once you start to lay your hands on these inputs, the Increments themselves are all about getting stuff built! Yes you need to maintain some documentation, but keep it lightweight and value add.
  • UI Prototypes - use cases are okay, but there's no substitute for putting, at least what looks like, a real solution in front of stakeholders. In my experience UI prototypes validate system requirements better that any process modelling or workshops could ever do.
  • Demo-able Solution - if it's feasible to build some form of functional prototype within the bounds of an Iteration, then you should. Focus on the most complex or least understood area of the solution first.
  • Architecture Prototype - sense check your technology stack, runtime topology and non-functionals as early as you can. Often these issues constraint the functionality than can be implemented. For example, you may be able to do some fancy stuff in the Browser with a Plug-In, but the Corporate firewalls block the port it uses. You want to find these issues out right at the start of the project before you comit to the architecture.
  • Candidate Feature List - as you're producing these prototypes and getting feedback from stakeholders, you'll start to get 'real' useful requirements, that are in context of a system. I don't call these requirements, I call them system features as they are tied to the architecture. These features should be unequivocally understood by the development team how they work, what good looks like and potential approaches to implementation and test.
In my experience, once a project gets into a 'groove' of running Increments continuous throughout the life cycle of the delivery, then the whole process becomes self reinforcing through better defined features, improved prototypes etc. In fact, the process doesn't really change from inception to the final go-live, prototypes gradually move to alphas, betas, pre-release versions, release candidates then a final decision to promote a release candidate live.

So, stop talking about requirements, and start building Solution Concepts.

If you're looking for further useful info on this approach then I'd recommend:
  • Eclipse EPF - an Eclipse project focused on a Open Source lightweight development approach based on IBM's RUP but stripped to the core.
  • Feature Driven Development - promotes a project delivery approach called FDD based on features. There's also a book available on FDD.
  • Introduction to Features - definition from Scott Ambler as to what a good Feature looks like.
  • Agile Manifesto - and finally, keep this web page on your browser at all times to remind you what your job is!