Replacing SQL

Submitted by peter on Mon, 06/18/2018 - 20:38

SQL databases are still popular for small to medium projects while NoSQL is taking over for large projects. NoSQL is joining CAS and other technologies that have a use but are over hyped when first introduced. What should you look at when investigating replacements for SQL?

Development speed

New technologies slow down development of applications, Web sites, everything. There is extra time learning how to use the technology and extra time building a code base. Eventually there are frameworks and libraries of code to help you use the new technology.

Early adopters end up working through several new versions with major changes. The people who jumped on AngularJS had to restart with Angular 2.

In the case of NoSQL, there are now SQL layers for the major NoSQL projects, giving you the option to move in through a hybrid approach while you redevelop existing code from SQL to NoSQL.

No this, no that

Fancy new software systems tell you they do away with this and that to save you time, to make everything faster, then you find out that most of the time they are exactly the same or worse. Lets walk through some examples.

No files

There is an overhead for processing files. Some SQL alternatives tell you they do not need files, saving you the overhead of processing files. Weird. They use files but call the files something else. Do you really save anything?

You could start with a unique file system, as happened with some software storage systems. You then have the extra file system storing data in file type structures but calling the files something else. All that code duplicates what is in your operating system.

A better approach is to understand the way your file system works in your operating system. select a good operating system, say Linux, use enough memory to let the OS cache active file entries, and you have a fast system. No matter what the file system is or what the replacement is, they slow down when they do not have enough memory.

Duplicating your existing file system with another set of code, another set of memory caches, and all the other overheads is a quick way to slow down processing. This type of replacement usually works only when your disk or server is dedicated to the new system. Then you can turn down the build in OS code and caches to help the alternative.

We see this working in some realtime configurations where everything is stored in memory and nothing is written to disk. You can configure a Linux system to work purely from memory with nothing written to disk.

When you need to write to disk, there is no advantage in replacing a good file system. Instead you look at switching off time wasters like atime, write code to minimise open files, and write code to keep open all active files. Tuning your existing code and OS for the best file usage is way ahead of replacing a file system.

No indexes

Some NoSQL products promote the idea of using no indexes. This is supposed to be an advantage over SQL. You know that tuning SQL includes looking at index usage. What you might not know is that the "no index" products build indexes but call them something else. Even worse, some NoIndex products make you create your own indexes, really slowing down development time.

NoIndex products may build the indexes in memory when they start, an overhead every time they start. NoIndex products might give you code to build your own index in memory or might index products by a fixed location where you create a massive expansion of disk usage in an attempt to be able to calculate the location of a record.

There are examples of where this works. They are rare. Many SQL databases have options to optimise their databases for cases where a NoIndex approach could be good buy not perfect, reducing NoIndex usage to a very small part of your overall data storage.

The photograph library example

You are storing photographs for a shop. You have a million products to sell. Some products have five or more images. How should you store them and access them?

Fixed size images

Start with every image having the same size or a similar size. Suppose every file is ten megabytes. All the files are numbers 1, 2, 3, 4, etc. If you created your own file system, you could store every image as a fixed block of 10 MB. When you need image 11, you look at a fixed location.

Byte 0 in your storage is image 1. Byte 10,000,000 is the start of image 2. Image 11 starts at (11 – 1) * 10,000,000. This is one case where you do not need an index. Do you need a new file system?

Existing modern file systems can handle huge files which means you can use an existing file system, create one big file, then store your images in the one big file. If the one big file is created at full size, a good file system will create the space as one big extent so there is no overhead from processing multiple extents.

As a further speedup, you would round up the space allocated for an image to the boundary of a disk block or storage segment, reducing the slight overhead of accessing an odd block.

Magnetic disks and SSDs have peculiar requirements that only apply when updating portions of a file. For a file with fixed length records, you get maximum update performance from aligning records to the internal segments used by the storage device.

Some SQL databases have special code for storing and accessing fixed length rows. SQL databases also allow partitioning where you can split big tables over multiple storage devices, or servers, then calculate the partition for a row without having to use an index.

Fixed size thumbnails

Your images might not be all the same size. You will probably create thumbnail images for display on your Web site. The thumbnails will be limited in size and can be accessed at a fixed location. You can use one file with fixed length rows aligned to blocks.

You might create thumbnails in a variety of sizes. Each size can be stored in a file with a record size matching the image size for that size thumbnail.

Your thumbnails will probably be sent to a CDN, a Content Delivery Network, where every image will be a separate file. Given the need to frequently access the thumbnails as image files, you do not save much processing time by placing the thumbnails in a special file or file system.

The CDN software will have some optimisations for handling millions of thumbnails. You can probably save the most time by generating your thumbnails straight into the CDN system or into a server in the CDN format, ready to replicate direct to the CDN.

Variable size image files

Looking in one directory of photographs, the files vary from 6 MB to 22 MB. I know there are many images down near the 6 MB size and some images up around 50 MB. Think about fixed size storage for the images. 50,000 files. For fixed size allocation, you need 50,000 times 50 MB or 2500,000 MB, 2,500 gigabytes or 2.5 terabytes. That actual space allocated is only 500 MB. Would you waste all that extra space for speed?

Scale that up to a million images. That actual space needed is only 10 TB, achievable with 6 SSDs in a RAID 5 array. Using fixed allocation, you need 50 TB which would be 26 SSDs in a RAID 5 array. The extra 20 SSDs, at a current local price of $700, would cost $14,000.

Variable size plus index

What would happen if we used variable size image records and an index? We want to store an address into 10 TB. A 48 bit integer address can cover up to 281 TB. Each 48 bit address uses 6 bytes. We create an index with a million times 6 bytes or only 6 MB.

At application startup, we load the 6 MB index into memory. To find an image, we multiply the (image id - 1) by 6 then read the 48 bits integer address from the index. The 48 bit address fives us the start of the image record.

Now we need the image length. The length could be the first 32 bits of the image or we could add the length to the index.

More calculations

This location and length stuff is handled automatically by databases. Your replacement for a database might handle the calculations or force you to handle them.

Look at the question of storing variable length records. You have to have a length somewhere. If the length is stored at the start of a record, a common practice, you have to read the start of a record before you can find the length. This slows down your input code. Access would be faster if you knew the length when you issue the read command.

Adding the length to the index is easy. An extra 4 bytes per entry. Each index entry jumps from 6 bytes to 10 bytes and our example index with a million entries jumps from 6 MB to 10 MB. If we do that, there is no reason to waste space for lengths in the image file.

Think about the case where the index is corrupted and there are no lengths in the image file. You cnnot rebuild the index. Storing the length in both files is a good safe measure. Your software may have that type of redundancy built in after years of development or it might not.

How many entries are in the index file? You could read the file then use the length to calculate the number of entries. You can do other things and some of them will improve performance some of the time but add extra overheads at other times. welcome to database/data storage design.


In our example, we number our images from 1 and subtract 1 from the image number to get the address in the index. What would happen if we did not subtract 1? There would be an empty entry at address 0. We could use the empty entry to store the length of the index or the address of the first unused space at the end of the index.

By adding a length to the index, we can speed up additions. We can over allocate space to the index and the image file. We know where the data ends and can add new images without having to reallocate the files. This makes insertion of new images faster.

Some database software does this automatically. Some software over allocates under your control. You tune the software to over allocate enough to handle your inserts without pushing up your storage costs by a massive amount.

Think about the cost of adding an extra SSD to a RAID array. You do not want to do that every day. You might plan on doing once per month. You need to over allocate to fit the maximum inserts in a month. This is something you have to plan no matter how you store your data.


What happens when you replace an image? Fixed length records are simply changed in place. Variable length records can be replaced when the new image is the same length or shorter. The real problem is inserting a longer image.

To insert a longer image in the middle of the file, you have to rewrite the whole file, something too slow for our use. The alternative is to insert the updated image record at the end of the file and stop using the old image record. What do you now need?

We added a length to allow for variable length records. Now we need the image id in the record. The id could be a 32 bit (4 byte) integer at the start of the record. If you use names instead of our example integer id, your id field would be a variable length string with a small length at the start. You also need something to indicate a record is deleted. You could put a one byte flag or create a convention where an id of zero indicates the record is deleted.

What happens in an update where we have to insert a new record? We add the record at the end. Then we update the first record to indicate the new total data length. We then update the id of the old record to make it indicate the record is deleted. This is all handled by some software and requires code from you for other software.

At some point, your file will bloat out with empty records and you have to write code to compact the file. You might choose to put the records back in order. You might choose to compact in place or to copy the whole file. What happens with new updates while you compact the file?

Index, index, index

Indexes make updates slower and access faster. Take the example of a postcode lookup. Australian postcodes are 4 digits. Suburb names vary in length. In typical use, you want to find a postcode for a suburb name and a suburb name for a postcode.

Look at a file for converting suburbs to postcodes. Long names in the postcode list include "Darwin International Airport", a 28 character name. Short names include "ANU", a 3 character name. There are over 16,000 entries. There are multiple codes for some names and multiple names for many codes.

You would start with a database table or your own code file storing all the entries. You index the codes and index the names to allow access by name or code.

What about state? Some suburb names occur in multiple states. When you present a selection list for something like address entry, you offer a state selection then offer the suburbs for that ste. Now you need an index by state.

SQL can easily combine indexes to let you select by name and state. If you are not using a database, you have to perform a two part access, state then name, or build another index combining state and name.

For a database, you might choose to build a combined index for performance. Changes like that are really easy. Just open up an administration screen and select the columns for the index. Without an SQL database, you have to write code.


A good approach is to develop everything in SQL. When an access hits an activity threshold, look at the SQL database options for refining performance. At the next level of activity, look at bypassing SQL with alternatives just for the data tables used in the activity.

Many of the big NoSQL systems use NoSQL only for big tables and use SQL for everything else.

Another hybrid approach worked for one big project. The data is updated in a master database then replicated to slave databases. The master database uses a database engine designed for frequent updates and indexes are built purely to help the updates. the slave databases use a read oriented database engine and have indexes designed just for the read accesses.

Count the data, count the accesses

For a hybrid, or any other approach, count the data and the accesses. You can do that in your SQL database when ramping up your alpha stage tests. You will see the number of rows in each table, the number of unique entries in indexes, and a dozen other measurements you can use to plan changes to accesses.

Instead of writing all the code then testing with token data, load all the data up front so that you can analyse the data. You will get a better idea of how to structure the data and which accesses need the most attention. You can draw an accurate line between the easy development of SQL and the difficult bits requiring more expensive development.

Choose your software carefully

Almost every software choice makes it easy to read data. The real access and performance problems happen when you update. Look at updates first.

If most of your accesses are only reads, look at replication and CDNs to remove the read activity from the data store where you perform updates.

Work through good design first. Load all your data and measure. Look at software to fit the design and access activity.