Developing my software career from the late 80's through the 90's, RDBMS's were always there, a key enabler for the move to Client / Server. Even when the Internet and HTTP came along, RDBMS's were still providing the backbone to applications.
Most architects and developers have grown up with relational algebra, normalisation and, of course, SQL. I remember the brief flirtation with the OO database revolution that never quite took off and it's still kicking around. I suspect architects just get used to the fact that data persistence is probably going to involve an RDBMS, what else would you use?
I believe the mantel of the RDBMS is beginning to be challenged with rise of a number of alternative database and persistence approaches built ground up with the Web and HTTP in mind. A lot of these new database engines share common principles and technologies, including using HTTP, REST, JSON and XML as the primary query tools, having flexible data models that are more document orientated than relational structured. All of these solutions take away the classic RDBMS problems of maintaining indexes, keys, relationships, allowing the developer to focus on the typical CRUD operations without worrying about how that data is structured, indexed or persisted.
Amazon opened up their e-Commerce services a few years ago now under the AWS banner. I've had an Amazon Developer account pretty much since the service was launched, mainly out of interest and experimentation than developing any real-world applications. Amazon have been steadily adding new services and finally added their database solution SimpleDB.
Currently in beta, SimpleDB provides a straightforward API to create domains, put, get and delete data and querying capabilities. Given the massive move away from SOAP to RESTful web services, I don't think it's any coincidence that Amazon have chosen the core HTTP verbs of get, put and delete for their SimpleDB API.
The data metaphor Amazon use for SimpleDB is the spreadsheet. Worksheets are akin to domains (RDBMS tables), items are rows, values are cells (single column value in a RDBMS table). The big difference is whereas a spreadsheet cell and RDBMS row/column intersect can only contain one value, a SimpleDB can contain many values. For an example take a look at the Product Catalogue domain below:
In this example Sweatpants have Color values of Blue, Yellow and Pink.
SimpleDB provides two query mechanisms, a SQL like Select expression, and a predicte type approach with Query expressions. Access is provided by either a SOAP or RESTful interface. For example, a RESTful call to add an Item called item123 to the domain 'mydomain' looks like:
https://sdb.amazonaws.com/?Action=PutAttributes
&DomainName=MyDomain
&ItemName=Item123
&Attribute.1.Name=Color&Attribute.1.Value=Blue
&Attribute.2.Name=Size&Attribute.2.Value=Med
&Attribute.3.Name=Price&Attribute.3.Value=0014.99
&AWSAccessKeyId=<valid_access_key>
&Version=2007-11-07
&Signature=Dqlp3Sd6ljTUA9Uf6SGtEExwUQE=
&SignatureVersion=2
&SignatureMethod=HmacSHA256
&Timestamp=2007-06-25T15%3A01%3A28-07%3A00
The XML response returned:
In terms of out right performance, sat out there in the 'Cloud' SimpleDB isn't going to be able to complete with an instance of an RDBMS sat a switch away from your App Server, let alone a product like Oracle Coherence. What SimpleDB does offer through, is a quick and cost effective way of building flexible data driven applications in the 'Cloud' without worrying about hosting, DBA maintenance etc.
<PutAttributesResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07">
<ResponseMetadata>
<StatusCode>Success</StatusCode>
<RequestId>f6820318-9658-4a9d-89f8-b067c90904fc</RequestId>
<BoxUsage>0.0000219907</BoxUsage>
</ResponseMetadata>
</PutAttributesResponse>
SimpleDB is getting attention through Amazon's presence and branding, but there are a number of alternatives.
Dabble DB goes one step further than SimpleDB and not only provides a database, but adds forms allowing users to build quite flexible data driven web apps. You still use Dabble as a database back-end to your own application tier through a Javascript and JSON API. Dabble is ideally architected for AJAX applications running from the Browser. An example query to Dabble from JavaScript is shown below.
Effectively, Dabble DB is Microsoft Access for the Web.
Dabble.addView({
_class: 'View',
id: 'e63a411d-7cbb-4399-9b65-37cfee8546e3',
name: 'Authors',
fields: [88],
entries: [
{_name: 'Homer', _id: 45, country: 'Greece'},
{_name: 'Margaret Atwood', _id: 95, country: 'Canada'},
{_name: 'James Joyce', _id: 44, country: 'Ireland'}
]
});
Not all of these new database engines run solely in the Cloud. Apache have the CouchDB project currently in incubator. CouchDB is interesting for a number of reasons. Not only does it support an adaptive document centric database with a RESTful JSON API, but it's developed in Erlang, rather than C / C++ or Java.
An overview of CouchDB's architecture can be seen below:
CouchDB is document centric, schema free with a flat address space. Documents are comprised of fields that can contain strings, numbers, dates or more complicated structures such as ordered lists and associative maps. An example document for a blog post could look like:
"Subject": "I like Plankton"
"Author": "Rusty"
"PostedDate": "5/23/2006"
"Tags": ["plankton", "baseball", "decisions"]
"Body": "I decided today that I don't like baseball. I like plankton."
To put structure over what, essentially, is an unstructured store, CouchDB provides support for views which are written in JavaScript. A simple view construct is shown below:
function(doc) {
if (doc.Type == "customer") {
emit(null, {LastName: doc.LastName, FirstName: doc.FirstName, Address: doc.Address});
}
}
This view function creates a row for every document in the database that is of a Type 'customer', returning fields LastName, FirstName and Address. This view applies a key of 'null', there it therefore can't be referenced or sorted. An indexed and sortable view would look like:
function(doc) {
if (doc.Type == "customer") {
emit(doc.LastName, {FirstName: doc.FirstName, Address: doc.Address});
emit(doc.FirstName, {LastName: doc.LastName, Address: doc.Address});
}
}
And would return a JSON result that would look like:
{
"total_rows":4,
"offset":0,
"rows":
[
{
"id":"64ACF01B05F53ACFEC48C062A5D01D89",
"key":"Katz",
"value":{"FirstName":"Damien", "Address":"2407 Sawyer drive, Charlotte NC"}
},
{
"id":"64ACF01B05F53ACFEC48C062A5D01D89",
"key":"Damien",
"value":{"LastName":"Katz", "Address":"2407 Sawyer drive, Charlotte NC"}
},
{
"id":"5D01D8964ACF01B05F53ACFEC48C062A",
"key":"Kerr",
"value":{"FirstName":"Wayne", "Address":"123 Fake st., such and such"}
},
{
"id":"5D01D8964ACF01B05F53ACFEC48C062A",
"key":"Wayne",
"value":{"LastName":"Kerr", "Address":"123 Fake st., such and such"}
},
]
}
The choice of the Erlang VM runtime for CouchDB is also interesting. Erlang was developed by Ericsson as a platform for real-time Telecom systems. Erlang's support for lightweight threads, concurrency and all inter-process communications via messaging, is a highly scalable, distributed and fault-tolerant environment. Much more so than any current Java VM. This should make CouchDB perform very well.
CouchDB is stateless and is accessed entirely by HTTP, essentially following REST principles. This means CouchDB supports caching through proxies and edge server devices without modification.
Even though CouchDB is still an Apache incubator, there are some real-world apps built on it out there already. An interesting example is Ajatus, a sort of 'reverse CRM' solution.
Of course, no article on next-gen databases would be complete without mentioning the biggest one of them all - Google's Bigtable. Essentially, Bigtable is based on a huge sparse distributed hash map. Going into Bigtable in detail is well beyond this article, there's a publication available from Google here.
So is this really the end for the RDBMS? I suspect not just yet. There are hundreds of thousands of organisations and enterprises out there running their critical apps on Oracle, SQL Server, not forgetting the ubiquitous LAMP environments, typically with MySQL back-ends.
Even so, I believe these 'new generation' databases offer opportunities to build highly scalable, fault-tolerant and distributed applications with adaptable data models that inherently support the architecture of the web. With the likes of Amazon and Google heavily promoting these technologies, I personally would be worrying if I was in the database division of Oracle or Microsoft.