Why I choose CouchDB over MongoDB

Theres a lot of discussion lately over NoSQL databases for high-performance distributed web apps. Actually thats an understatement, theres been a lot of discussion about NoSQL for the past few years. NoSQL removes the need to a schema, in favor of non-relational data being stored in “documents” (and then theres some really cool magic that happens to keep track of them all).

Currently there are 2 main contenders; CouchDB and MongoDB, and if you care about your data saving, and alway having your data available MongoDB is not a good fit. The marketing engine behind MongoDB is huge, and they have very good writers doing the content for their website. You get Examples like below:

Use Cases
It may be helpful to look at some particular problems and consider how we could solve them.

  • if we were building Lotus Notes, we would use Couch as its programmer versioning reconciliation/MVCC model fits perfectly. Any problem where data is offline for hours then back online would fit this. In general, if we need several eventually consistent master-master replica databases, geographically distributed, often offline, we would use Couch.
    mobile
  • Couch is better as a mobile embedded database on phones, primarily because of its online/offine replication/sync capabilities.
    we like Mongo server-side; one reason is its geospatial indexes.
    if we had very high performance requirements we would use Mongo. For example, web site user profile object storage and caching of data from other sources.
  • for a problem with very high update rates, we would use Mongo as it is good at that because of its “update-in-place” design. For example see updating real time analytics counters
    in contrast to the above, couch is better when lots of snapshotting is a requirement because of its MVCC design.

Generally, we find MongoDB to be a very good fit for building web infrastructure.

There are a log of magic words and phrases in here “MVCC”, “update-in-place design”. Heres what they really mean.

MVCC (Multi-version concurrency): If you go to write something and it fails you don’t lose your data. The new data get’s added to the system, and then when that is successfull the old data, is either forgot, versioned, or thrown away.

“update-in-place design”: We write over the old data right where it was stored, so if something goes wrong you’re hosed. So instead of leaving the last working copy alone before writing, it writes directly on top of it. If this write fails for any reason your data is gone.

In general, if we need several eventually consistent master-master replica databases, geographically distributed, often offline, we would use Couch.

This is really my biggest issue, what’s described here in my opinion is what being “web-scale” is all about. Read/Write your data everywhere and distribute it. The point about data “offline” is actually one of the BEST things about CouchDB, and is kind of tossed aside here. You can maintain your CouchDB database on a mobile device and then synchronize it with one on a server. The web is more than just websites, its data going between places. Arguably anytime your not synching data actively your offline.

In short, if your server crashes, someone trips over the powercord, etc… MongoDB is more than likely going to lose data, while CouchDB will wake back up, see what it was last doing and will lose only the write that was happening at the time of failure. All the benchmarks of performance in the world can’t outweigh that. The biggest saddness about this is that MongoDB won’t even let you know that it failed to write the data. A perfect storm being, the user “saves” their data from their app, the save fails, and they leave their session. When they come back the document is gone.

Use MongoDB only if you don’t care about the state of the data, but want to sling it out distributed as fast as possible. If you’re willing to wait an extra millisecond to ensure that that save and replication actually happens, and when it fails you just use the last valid version use CouchDB.

This entry was posted in Databases and tagged , , . Bookmark the permalink.

9 Responses to Why I choose CouchDB over MongoDB

  1. k_bx says:

    Well, if it’s critical — use w=majority in MongoDB (or safe=True in python driver). Then there won’t be a situation you described (user will wait until data is written).

  2. Ewan Makepeace says:

    You are being sensationalist here – as you yourself pointed out, Couch is quite capable of losing the data it was saving at the time of failure which is what you accuse Mongo of doing. The difference is that Couch will still have the older copy whereas Mongo *might* have partly overwritten it.
    However speaking as a Couch developer, other issues crop up – Couch’s almost pathological hunger for disk space and consequent hogging of I/O bandwidth, its complete inability to query for data not anticipated by a predefined view (or even to combine existing views to find their intersection), and for a distributed system, its weak security model.
    For a purely cloud based backend to a web server system I think Mongo is probably a better choice than couchDB in the long run. A couple of technical ‘wins’ dont make for a rounded solution.

  3. Pingback: Links for December 4th through December 9th — Vinny Carpenter's blog

  4. Nik says:

    I’ve heard MongoDB is quite configurable, are you sure that there isn’t a way to configure it to behave in the way that CouchDB does?

  5. Scott Hernandez says:

    Master-master and mobile are two very interesting use-cases and where couch does well, along with filtered replication which is really needed in many mobile cases.

    Your statements about data lose, or getting errors back from writes, with MongoDB are unfortunately completely wrong. Only by default (and not even in all frameworks) will writes work as “fire-n-forget” — meaning that the write isn’t verified after being sent to the server. All drivers and frameworks support a “safe” write concern which waits for the server to acknowledge (either in memory, in the journal, directly in the database files, or across some subset of a replication set) the write before returning to your client code.

    Also, with journaling (a write ahead journal file) any serious machine failure will be immediately recoverable on restart, and system will recover to the last journal commit. There is a short period of time where in-flight changes for the journal (basically collecting data for the next commit) where data could be lost in the case of power failure, but that window is configurable and even controllable on each write from the client.

    http://www.mongodb.org/display/DOCS/getLastError+Command
    http://www.mongodb.org/display/DOCS/Journaling

    • chris says:

      Scott,
      Thanks for the in-depth information. I’m glad to find out there is an option to get a confirmed write back. Does the write ahead journaling, remove the in-place writing as well, or is it just a write here first, then put off the official replacement till later?

      From my research and messing around with the 2 systems the defaults are primarily where I’ve based my content. I believe myself and other developers who have grown up with the Relational ACID compliant systems would be more apt to go for a solution that by default maintained consistency instead of turning it on as an option. Then if we decided we wanted to “turn up the heat” we could remove redundancies where they weren’t needed. Out of the box couchDB is a safer environment to start in.

      I still feel that out of the box couchDB is more what people expect with a DB system. MongoDB has amazingly fast performance and I think it fits better for a solution where you need to distribute primarily read data, where the cost of a failed write is minimal. The in-place writing still makes me weary of the platform.

      • Sam Gaw says:

        As a SysAdmin, anyone running out-of-the-box configs in production, regardless of the system, are asking for problems.

        Sane defaults might help setting up a basic sandbox to play about with when you don’t know the system but it’s hardly criteria to dismiss something if there aren’t.

  6. Chris,

    Thank you very much for your article. I think I’ve got something that might be interesting for you. We just released a Java library called MongoMVCC which implements MVCC on top of MongoDB. You get full isolation and apart from that you can access your data’s history. That means you can checkout any previous version of your data at any later time.

    The project can be found on GitHub:
    https://github.com/igd-geo/mongomvcc/

    The wiki also contains a comparison with CouchDB which fits quite good to your article here:
    https://github.com/igd-geo/mongomvcc/wiki/Why-should-I-use-MongoMVCC

    Please let me know what you think.

    Cheers,
    Michel

  7. Guest says:

    But MongoDB is web scale….

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>