by Paul Rudo on 31/03/11 at 4:40 pm
Over the past several years, tech headlines have been increasingly peppered with the term “NoSQL.” One can quibble about the accuracy and usefulness of that term (and many do), but that conversation can and should be ignored. What can’t be ignored, however, is the dramatic transformation in software systems that is driving the emergence and growing adoption of this new class of non-relational database technology.
Interactive Software Has Changed
Interactive software (software with which a person iteratively interacts in real time) has changed in fundamental ways over the past 35 years. The “online” systems of the 1970s have given way to today’s Web applications and a modern application architecture that addresses the radical differences in users, applications, and underlying infrastructure.
However, relational database technology, which was invented and optimized for the systems of the 1970s, has not kept pace with these changes and, in some regards, is the last domino to fall in the march toward fully distributed software architecture.
- Users: In the early days of interactive systems, an online application that supported 2,000 users was considered huge. Further, the population of users was controlled, worked within well-defined office hours, and was relatively static in size. Today, applications accessed via the public Web – for example, online banking systems, social gaming, e-commerce – support a population of users several orders of magnitude greater in size. A newly launched software system can grow from zero users to millions overnight – and those users can be located anywhere in the world, requiring 24×7 application availability.
- Applications: In the past, interactive software systems were primarily designed to automate existing manual processes and typically mirrored clerical employee tasks that culminated in some sort of “transaction.” These systems accelerated task completion and improved accuracy, but were about automating – not innovating. Modern applications, in contrast, break new ground, changing the nature of communication, shopping, advertising, entertainment, and much more. Change is really the only constant in these systems.
- Infrastructure: Perhaps the most obvious difference between then and now is the infrastructure atop which interactive systems execute. Centralization characterized the computing environment in the 1970s – mainframes and minicomputers with shared CPU, memory and disk subsystems were the norm. Today, distributed computing is the norm.
Application Architecture Has Changed
To address these changes, modern Web applications are built to scale out: just add more Web servers behind a load balancer to support more users. The result is attractive (near-linear) cost and performance curves, but the real win is the flexibility this distributed application architecture affords. Beyond the ability to quickly add or remove Web servers to support user volume and activity levels, distributing the load across servers (and even geographies) is inherently fault tolerant, supporting continuous operations.
RDBMS: Shortcomings and Band-Aids
In contrast to these sweeping changes in application architecture, relational database technology has not fundamentally changed in 40 years.
- It remains a centralized, “scale-up” technology; runs on complex, proprietary, expensive servers; and handling more users requires getting bigger (and even more expensive) servers (for increased CPU, memory and I/O capacity).
- Running RDBMS technology in an otherwise distributed architecture highlights its lack of flexibility for “rightsizing” the database in real time to fit the needs and usage patterns of the application. (The Web logic layer scales out; the relational database, well, can’t).
- The rigidity of the database schema – the fact that changing the schema once data is inserted is A Big Deal – makes it very difficult to quickly change application behavior, especially if it involves changes to data formats and content.
Recognizing these shortcomings of RDBMSs for modern interactive software applications, developers and practitioners have come up with some workarounds – for example, sharding, denormalizing, and distributed caching – which, while useful to a limited degree, are really just Band-Aids that ease symptoms, but don’t fight the disease.
So Why NoSQL?
Early Web pioneers such as Google and Amazon, faced with the inadequacies of relational technology, and blessed with the ability to invent their own databases, developed (and now depend on) Big Table and Dynamo respectively to meet their highly distributed database needs.
And the NoSQL database was born.
What makes a NoSQL database a NoSQL database?
- It’s schema-less. Data can be inserted without a defined schema, and the format of the data being inserted can change at any time – providing extreme flexibility in the application and its ability to change with the needs of the business.
- It’s elastic. A NoSQL database automatically spreads data across servers, requiring no participation from the applications. Servers can be added and removed from the data layer without application downtime, with data and I/O spread across servers.
- It’s queryable. Sharding an RDBMS can seriously inhibit the ability to perform complex queries. NoSQL systems retain their full query expressive power, even when distributed across hundreds or thousands of servers.
- It caches for extreme low latency. To reduce latency and increase sustained data throughput, advanced NoSQL database technologies transparently cache data in system memory – a behavior that is completely transparent to the developer and the ops team.
NoSQL for the Rest of Us
Few companies can afford to develop and maintain their own NoSQL database, but the need for a new approach is nearly universal. Without a doubt, the reason NoSQL is picking up steam is because growing numbers of developers and ops teams recognize its potential for reducing the cost and complexity of data management while increasing the scalability and performance of interactive Web applications.
A number of commercial and open source database technologies such as Couchbase (a database combining the leading NoSQL data management technologies CouchDB, Membase and Memcached), MongoDB, Redis, Cassandra and others are now available and increasingly represent the most frequently selected data management choice behind new interactive Web applications.
For a more complete discussion, visit www.couchbase.com/why-nosql
This was a guest article, contributed by James Phillips, Senior Vice President of Products at Couchbase.