You are here

NoSQL

Submitted by Peter on Fri, 2010-10-15 06:37

NoSQL is general name for some alternatives to SQL based databases and was a big Information Technology fashion hit back in 2009 but is now fading fast due to problems and a lack of benefit.

NoSQL is not NoSQL

There is a relational database named NoSQL, from http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/nosql/, that is nothing to to with the NoSQL movement. The NoSQL database is designed to store data in Unix text files for access from Unix utilities. You have to read the whole file every time you look for some rows in a file. The only real use for this type of data storage is to save program settings. It quickly becomes grossly inadequate for anything larger. A better approach for long running applications is SQLite, as used by Thunderbird.

The top two

Cassandra and BigTable are the two main NoSQL databases. Cassandra is an Apache Foundation backed project based on open source code originally supplied by Facebook. BigTable is software used by Google for some of their data and is available only as a service, not as open source software.

Faster?

The main benefit claimed for NoSQL style databases is speed when used for storing large amounts of data. Nobody is seeing improved speed when NoSQL databases are used the same way as traditional SQL databases.

The main way to speed up NoSQL databases is to remove the requirement to make the data on disk current. You then run the risk of loosing data, money, and customers when your hardware, infrastructure, power, or network fail. In fact you will loose stuff but you will not know what you lost, including knowing if you lost something. You then have to add code to your applications to do the special things SQL databases do and that makes your NoSQL database as slow or slower than your SQL databases.

Some data is not important. If Google lose a few Web sites from a search, they will pick them up next time. Google have lots of flexibility. Clearly you do not want to lose financial transactions. You might not be so worried about user ratings of content or multiple images of a product. If someone visits your shop and sees four pictures of a product instead of five, or finds 11 reviews instead of 15, does it matter? In most cases no. You might have some frequently accessed data with lots of repetition where the lost of a small percentage does not matter. Showing 11 reviews in one second could be more important that showing the whole 15 over five seconds.

Bigger?

SQL databases offer special ways to handle a variety of big problems including long lists of data, large data items, and large inserts of new data. You may have to shop around the various database brands to find the right combinations for your requirements. MySQL has a basic free version and an enterprise version with extra features. PostgreSQL does similar things to the MySQL enterprise version. Oracle has more versions than I could cover in one page and now owns MySQL.

Consider a simple SQL expansion. Your database becomes too big for one disk so you move some tables to another disk. Some databases do not let you spread tables over several disks because all the tables are in one file. MySQL and some other databases use a different file for each table, making the split easy.

Now you have a table that is too big for one disk. MySQL, PostgreSQL, and others let you split tables into ranges, or segments, or partitions, they vary the name for split tables, and you place one range/segment/partition per disk.

A disk can be a huge storage device created by joining multiple disks together in one RAID array. The current large disk size is 2 TB, 2 TeraBytes, because of a silly software limitation, and 3.5 TB disks are ready to use when computer hardware manufacturers get their hardware up to date. You can get RAID array servers for 32 disks. 32 disks in a RAID6 array is 30 times 3.5 TB or 105 TB, more than you will need for a long time.

When you do need more than 105 TB, it will be all those video files you are storing. You can store those video files in a database without storing them in the database because the major databases have facilities to store large data items as separate files and those separate files can be on different storage devices, giving you the option of using more than 105 TB.

Lots of video

You could also write your application to store the big files as separate files and put only the registration information in your database, giving you more flexibility but without losing the advantages of an SQL database. Storing the whole contents of big files in your database is only an advantage if the database can search the content of the big files, something you cannot currently do with video.

The NoSQL alternatives give you no advantages for storing video. They cannot search video, making the storage in a NoSQL database as useless as storing the video in an SQL database. Storing videos as separate files then registering the video in a NoSQL database gives you no advantages over registering the videos in an SQL database.

NoSQL does nothing for video.

Huge text files

Huge text files look similar to huge video files until you realise you can search text. If your huge text files are inserted into your database as data fields, your database software can search all the text. PDF and other document formats can be searched the same as text. NoSQL databases give you no advantages when you perform the search.

When you use an external search facility, the facility usually needs the data as separate files. You are then back to storing your text/document files the same way as video and NoSQL has no advantages.

Google size sites

Google stores a lot of data and developed their BigTable for one main data storage task. They use SQLite for their regular data storage. There are less than ten Web sites with similar data storage problems. There are lots of databases outside the Web storing huge amounts of data and most use regular SQL. Wait until you are making a billion dollars per year before investing in NoSQL.

Hierarchical storage

The few huge non Web databases not using SQL are not using the common NoSQL projects, they are using data specific hierarchical storage systems. Hierarchical storage was around before NoSQL, in fact it was used before SQL. Some XML databases are based on the old hierarchical storage systems. SQL databases were developed to solve the problems caused by hierarchical storage systems.

Some NoSQL software uses some aspects of hierarchical storage.

Unknown problems

NoSQL is full of unknown problems plus known problems that are rarely mentioned. When you finish reading about all the known problems of existing NoSQL products, you will decide to leave the conversion from SQL to NoSQL until you are generating far more profit to cover the extensive conversion cost and the massive testing phase.

The unknown problems are caused by the lack of people using NoSQL. There are few chances of someone having exactly the same data as you coupled with exactly the same update requirements. Your problems have occurred before with the major SQL databases and people have found solutions. You will not get the same support with NoSQL, leaving you with the massive cost of developing your own support team and massive testing system.

Terminology

NoSQL products fall into three groups. The rest of the NoSQL products are SQL databases with a NoSQL interface added or SQL databases with some features switched off.

Column families

NoSQL products based on column families provide the table part of an SQL database without the SQL interpreter. You go back to the early 1980s and some primitive databases before they developed SQL support. Your programmers have to convert requests from logical requests to detailed mechanical code.

Document Stores

Document stores were around in the 1970s and came back in with the initial popularity of XML. Content addressable storage was another flavour. Some document stores let you add an SQL database on top so you can find documents. Most of the major SQL databases handle document storage without requiring anything separate to the SQL database. Document storage is nothing to do with NoSQL.

Key Value / Tuple Store

This is the real NoSQL. Each table is equivalent to one column in an SQL database. Reading an item with several attributes requires gathering data from many tables, all by hand. A table with 50 columns and 5 indexes will be replaced by 50 tuple stores and one or more tuple stores for every index. If the tuple store does not have multilevel indexing for performance, you have to build your own multilevel index using multiple tuple stores. A simple three key index might blow out to 20 tuple stores and need more tuple stores added every time you expand.

Public opinion

TechReplublic says NoSQL expands transparently and they’re usually designed with low-cost commodity hardware in mind. All the major SQL databases run happily on low-cost commodity hardware. Most of the major SQL databases expand as transparently as the available NoSQL products. They suggest you can do away with your DBA, DataBase Administrator, when you use NoSQL then point out you still need someone to perform the same work. The do not mention that you need some extra programmers to implement in your code all the things missing in NoSQL. They list five advantages for NoSQL but most of them are not true and some are only true when you work with the rare exception of unreliable data. They list five serious problems with NoSQL and most of them distill down to the need for many expensive experienced highly skilled people you do not need with SQL.

Quite a few people with experience point out that Google, the developers of BigTable, run all their mission critical data on SQL databases. Most of their data is in SQL databases. The only stuff that is not in SQL are a few of the tables behind their search.

Focus

Successful Web site owners know their success comes from creating a better product, service, or Web site, not reinventing the software behind the Web site. If your Web site becomes big enough to require something special, you will get the best results quicker by contributing to an existing open source SQL database. A small change to MySQL is far easier than a hundred percent reinvention. There are more people with the right experience. There are more people ready to invest in your improvement because they are working on something similar. Everything is easier.

Conclusion

If you are not in the top ten Web sites, the cost of switching from SQL to NoSQL is greater than the advantages. The unknown problems of NoSQL will kill you long before you outgrow SQL databases. Google and Facebook have the resources to support a NoSQL approach for the biggest database table in their main applications.