Follow Us

Views from the Lab

To SQL or NoSQL - that's not the question



We receive a lot of requests from our clients about the applicability of NoSQL technologies on their enterprise systems.

If you haven’t heard about NoSQL by now, we contend that you’ve been in hiding for the past three years. The term NoSQL itself refers to a broad class of data management that is not based on the traditional relational tables in typical RDBMS. Examples include document-based data storage such as MongoDB, key-value pair such as Riak, or column-based storage such as HBase.

While assessing usage of NoSQL, we are often presented with the CAP model which is great from the theoretical perspective, but grasping its implications on real world enterprise systems is difficult.

In this post, we’ll explain some of the design factors and the associated implications of using NoSQL in the context of three data-centric enterprise applications: Portal, Reporting and Analytics. Enterprise transactional systems are not included for obvious reasons.

1. Portal: Anything but the simplest static HTML page has some sort of data access module at the back-end that formulates a query to fetch relevant records or composite views from one or more data stores. Generally, the overall fetching operations on many NoSQL technologies could be several orders faster than RDBMS as one can fetch the data from the shards in a scale-out parallel manner.

But here is where the complication arises: not all NoSQL stores are equally capable. Out of the box, some do not support range queries, some support only simple predicates with constant values, some do not support composite indices or queries that are not based on keys. Thus, while the data fetch may be blazingly fast, the complexity is pushed up to the application layer (string-based joins, anyone?).

Now, if your portal requires data update functionality, the write operation in NoSQL is generally smooth even in high volume because many of them exist largely in memory. In contrast, traditional RDBMS deals with locks, latches, re-do logs and so forth. However, bear in mind that NoSQL does not generally support atomicity, consistency and in most cases proper durability. Thus for applications that require absolute transactional integrity and serialisation, SQL databases are still the top choice.

2. Reporting: Reporting applications inherently employ more complex queries than portal. A typical read query may have multi-table joins, nested joins, filters etc. Any write or update operation may cascade to multiple tables and will have to maintain referential integrity constraints. In general, NoSQL is a poor choice to support these applications (but products like HBase or GFS supports time interval-based and condition-based queries). Further, if you need to prepare your data for reporting by running elaborate batch applications to transform data with many-to-many mappings, there is no easy alternative to trusted SQL-based loading and transformation engines.

3. Analytics: The sweet spot for many NoSQL data stores is on our third set of applications: those that heavily use analytics, especially when it comes to processing to large amount of unstructured data (aka big data). Used in conjunction with MapReduce, NoSQL can parallelise and quickly perform complex operations such as statistical aggregates, filtering, grouping and sorting. The scale-out capabilities of NoSQL databases - e.g., Cloudera and Aster Data Systems -- allow handling of large volumes of unstructured data suitable for analytics. In terms of price/performance, NoSQL quickly gains ground over traditional data warehouses and BI appliances.

So, where does that leave us? From the perspective of application requirements, would you go for SQL or for NoSQL?

Let’s learn from those who have traversed this path.

Take LinkedIn for example. It started by using a RDBMS, but went on to develop a NoSQL data store instead. The primary reason: performance. According to Jay Krep’s blogs, LinkedIn started with a single big database and a cluster of front-end servers, but soon they split the database for a variety of functions like serving up profiles, performing searches, group interactions, networking updates etc. Each function had different application profiles and workload.

It became apparent that LinkedIn may require NoSQL storage for supporting specific high-volume OLTP workloads. This was the genesis of the Voldemort project. Facebook similarly has a mixed architecture with Memcached and MySQL in combination for high performance OLTP (such as message posting to the Wall), and Cassandra for the inbox search mechanism, each serving a different workload and purpose.

From the application perspective, the question is therefore, not of “SQL or NoSQL”. Rather we should embrace a world with SQL and NoSQL. The majority of NoSQL data stores are not general purpose, but application specific. If you are running a SQL shop with 3-tier architecture with pretty heavy business logic and even heavier stored procedures, then there is absolutely no reason to change that. But if you require very fast simple OLTP operations, if you are adding a chat capability to drive social commerce, or if you need to constantly tweak your customer attributes and constraints, then NoSQL should be considered. For your complex queries and data-driven integrity rules and for consistency and durability, RDBMS still rules.


By Edy Liongosari, Global Director of Research and Shubhashis Sengupta, Senior Researcher, Accenture Technology Labs

Tags: accenture, applications, database, lab, nosql, rdbms, sql

RSSSubscribe to this blog

Contact Us

For editorial queries:
Mike Simons Mike_Simons@idg.co.uk

For website issues:
Email webmaster@techworld.com

For commercial queries
Russell Kearney russell_kearney@idg.co.uk


For more contact details click here.


Email this to a friend

* indicates mandatory field





Techworld White Papers

Optimising data protection for virtual environments

VM environments require the same level of data protection as does the physical server environment. Companies may use data protection tools built for the physical environment in the virtual world, but this has serious disadvantages.

Download Whitepaper

PCI Compliance: Are UK businesses ready?

Exploring the results of a recent survey, including: ? Levels of understanding of the standard ? Current perceptions of actual compliance status ? Attitudes toward addressing compliance

Download Whitepaper

Mobility Management for Dummies

Your complete guide to managing and securing mobile devices such as laptops and smartphones.

Download Whitepaper

Magic Quadrant for midrange and high-end NAS solutions

It is difficult to find one midrange or high-end NAS product that can cater to all needs. File systems embedded in NAS are often designed to solve one major pain point, with additional features being added later to broaden use cases and benefits.

Download Whitepaper

Techworld UK - Technology - Business

Oracle Video

Enabling agile and intelligent businesses

 Changing markets, competitive pressures and evolving customer needs are placing increasing pressure on IT to deliver greater flexibility and speed. Explore truly flexible SOA foundations with this Oracle video.

Watch
COLT White Paper

IT Misuse Survey

Complete this survey and you could win a Nexus One

Techworld are running a short survey to discover how UK businesses are managing Internet and email misuse in the Enterprise.

Complete Survey

Complete our survey and you could win a Sony E-book Reader.
Techworld have teamed up with HP to compile a survey relating to server virtualisation. Complete the short survey and you could be the lucky winner of a Sony E-book reader.

Complete the survey here

Site Map

Test