Theres a lot of discussion lately over NoSQL databases for high-performance distributed web apps. Actually thats an understatement, theres been a lot of discussion about NoSQL for the past few years. NoSQL removes the need to a schema, in favor of non-relational data being stored in “documents” (and then theres some really cool magic that happens to keep track of them all).
Currently there are 2 main contenders; CouchDB and MongoDB, and if you care about your data saving, and alway having your data available MongoDB is not a good fit. The marketing engine behind MongoDB is huge, and they have very good writers doing the content for their website. You get Examples like below:
It may be helpful to look at some particular problems and consider how we could solve them.
- if we were building Lotus Notes, we would use Couch as its programmer versioning reconciliation/MVCC model fits perfectly. Any problem where data is offline for hours then back online would fit this. In general, if we need several eventually consistent master-master replica databases, geographically distributed, often offline, we would use Couch.
- Couch is better as a mobile embedded database on phones, primarily because of its online/offine replication/sync capabilities.
we like Mongo server-side; one reason is its geospatial indexes.
if we had very high performance requirements we would use Mongo. For example, web site user profile object storage and caching of data from other sources.
- for a problem with very high update rates, we would use Mongo as it is good at that because of its “update-in-place” design. For example see updating real time analytics counters
in contrast to the above, couch is better when lots of snapshotting is a requirement because of its MVCC design.
Generally, we find MongoDB to be a very good fit for building web infrastructure.
There are a log of magic words and phrases in here “MVCC”, “update-in-place design”. Heres what they really mean.
MVCC (Multi-version concurrency): If you go to write something and it fails you don’t lose your data. The new data get’s added to the system, and then when that is successfull the old data, is either forgot, versioned, or thrown away.
“update-in-place design”: We write over the old data right where it was stored, so if something goes wrong you’re hosed. So instead of leaving the last working copy alone before writing, it writes directly on top of it. If this write fails for any reason your data is gone.
In general, if we need several eventually consistent master-master replica databases, geographically distributed, often offline, we would use Couch.
This is really my biggest issue, what’s described here in my opinion is what being “web-scale” is all about. Read/Write your data everywhere and distribute it. The point about data “offline” is actually one of the BEST things about CouchDB, and is kind of tossed aside here. You can maintain your CouchDB database on a mobile device and then synchronize it with one on a server. The web is more than just websites, its data going between places. Arguably anytime your not synching data actively your offline.
In short, if your server crashes, someone trips over the powercord, etc… MongoDB is more than likely going to lose data, while CouchDB will wake back up, see what it was last doing and will lose only the write that was happening at the time of failure. All the benchmarks of performance in the world can’t outweigh that. The biggest saddness about this is that MongoDB won’t even let you know that it failed to write the data. A perfect storm being, the user “saves” their data from their app, the save fails, and they leave their session. When they come back the document is gone.
Use MongoDB only if you don’t care about the state of the data, but want to sling it out distributed as fast as possible. If you’re willing to wait an extra millisecond to ensure that that save and replication actually happens, and when it fails you just use the last valid version use CouchDB.