The Economics: NoSQL vs "New SQL" vs Relational Databases

Economics are a big deal when deciding whether to go with a NoSQL, NewSQL, MPP DBMS, or traditional RDBMS.

Many of the products  (in all four of these categories) claim that economics are on their side, because either…

  1. Their products are ultimately cheaper than the alternatives, or
  2. Their products produce greater benefits, that dramatically overshadow their cost.

The first is basically a Total Cost of Ownership (“TCO”) argument.  The second is about  higher ROI (return on investment) or shorter payback periods, versus the alternatives.

The TCO, ROI, and payback period for any particular data store (be it of the NoSQL, NewSQL, MPP, or RDBMS variety) will vary greatly according to the application.  It is important not to take vendor claims of  “X% improvement” for granted.

Instead, create your own economic model for quantifying expected costs and benefits. Here are some factors to consider:

  1. Upfront Hardware costs – Many vendors claim their products require cheaper hardware, either by using fewer machines or by using cheaper machines.  Hardware costs affects both up-front project costs (CapEx) as well as yearly maintenance fees (operational costs) paid to the hardware vendors.  Hardware costs can usually be further subdivided into processors, memory, storage, network interfaces, and “other” (racks,  load balancers), dedicated storage appliances, etc.  The cost of memory can be a significant driver of costs, as well as the type of storage used (SSDs, SANs, RAID arrays of regular old hard disks, ….)

  2. Software license costs & maintenance – The cost here can vary wildly, from $0 for one of the many open source NoSQL solutions, to millions of dollars.  This item includes up-front CapEx costs – as well as the annual maintenance bill, typically about 20% of the purchase price.

  3. On-going Software support costs – If you buy a commercial data store, customer support is probably included in your annual software maintenance bill (see the previous bullet). But if you are looking at open source, you will want to find a vendor that can provide ongoing software support.  And that costs, even if the software itself is free.  This is an ongoing operational expense.

  4. On-going Hardware support costs – This is a support & maintenance contract on the hardware your purchased in #1. The cost varies per vendor, but is usually calculated as a percentage of the hardware purchase price – usually somewhere between 8% and 20%. For commodity servers (generic Intel- or AMD-based Linux boxes), many organizations forego official support contracts and have their own IT staff support the machines.

  5. Power costs – If you use fewer boxes, or use more energy efficient hardware, then your power bill will be lower — sometimes significantly so.  If you want to get fancy and impress others with your “green-ness”, consider calculating your carbon footprint, and take into account any Carbon Offsets you’d purchase to reduce the system’s environmental impact.

  6. Administrative costs – This bucket includes the cost of the staff needed to keep your system running and healthy — usually the full-loaded salaries of the DBAs and other administrators.  In general, the more hardware you are running, and the more instances of software you are running, the more administrative staff you need.

However, different products have different demands of administrators.  For example, “sharding” (sensibly dividing up data across multiple nodes) can consume a lot of administrator time if  done manually, but many NoSQL and NewSQL systems do this automatically.  Recovering from downtime–both planned and unplanned–can consume large amounts of administrative time, so the availability of the system impacts these costs.

Other tasks that can take up a lot of time, but vary considerably from system to system, include: time spent on upgrades (especially if updates to the software come out very frequently),  time spent on performance tuning, and time spent on monitoring, backups, etc.  Different products, and different rules governing data and IT, require varying levels of attention from human administrators.

  1. Developer productivity - If the structure and type of data in the datastore changes, often the applications that use the data store need to be changed as well. So, you should account for the fully loaded salaries (or consulting fees) for the programmer time needed to make these changes.

  2. “Hard” revenue impact - Different configurations of products will achieve different levels of availability.  This ultimately translates into some amount of system downtime or time when system is overloaded.  If your system is essential to your company actually making money (for example, it is an e-commerce store), than every minute of downtime results in lost revenue.  If the system is still “up” but is overloaded, then you lose the ability to serve some customers who would have bought.  Again, the result is lost revenue.

  3. “Soft” revenue impact – This includes things like “greater customer satisfaction”, “higher accuracy”, “better response times”  — things your choice of system might affect that will impact customers, and will thereby affect the amount of revenue they give your company.  “Soft” costs are very real, but are often difficult to quantify.

A Final Caveat: Note that the above factors assume that you will own your infrastructure – the hardware, software, etc.  ‘The model needs tweaking / revamping if you are “renting” infrastructure in the cloud, by using something like Amazon Web Services, for example.  (In the future, perhaps I’ll post about the economics of cloud deployments).