Tuesday 10 February 2015

CouchDB - MVCC and Conflicts

This is a small entry about couchDB's Multi Version Concurrency Control mechanism and what it takes to have conflicting documents end up on the couch.
Though MVCC is well covered by couchDB's documentation, I wanted to see it in action with my own Pies :-)

Setup

I have couchDB installed on two Pies, gandalf and samwise. On gandalf, the couchdb version is 1.6.0 whereas on samwise it is a 1.5.1.

We will first create conflicts on a single node (gandalf) and then on two nodes by means of master-master replication.
Curl will be used to talk to the couches (note: my curl shell on Windows does not like mixing " and ' which is why I have to put all the JSON data I want to send via curl into files).

If you want to replay this on your system, make sure to not only adjust IP addresses or host names but also substitute the revision values (_rev) with the ones you'll receive as response.
All curl commands and there respective responses are genuine. Responses are formatted for better readability.

What is a Conflict?

Before we start, let's agree on what a conflict is:
A conflict is a state where two or more versions of a document branch from a common root version. Only the leafs of conflicting branches are considered to be in conflict with each other.
Let's try to create this on couchdb.

We'll be acting on behalf of two users, first on one and then on two couchdb nodes.

Conflicts on a Single Node

On a single instance of couchdb, it is not possible to create a conflict when performing single document updates. If you want to update a document, you have to supply the latest revision of this document's revision tree. If you do not have this revision, your update will be rejected.

If you want to end up with a conflict, i.e. two revisions branching from a single common revision, you have to use couchdb's bulk update feature. But that's not all it takes. In addition you have to use the bulk update in the special "All-or-Nothing" mode.

Not so easy to to create a conflict on a single instance, but lets see.

We start by creating a database called mvcc on gandalf:
# check if couchdb is running
curl http://gandalf:5984
# response:
{"couchdb":"Welcome",
 "uuid":"360325151b6a3c70595a522b36f52037",
 "version":"1.6.0",
 "vendor":{"name":"The Apache Software Foundation",
 "version":"1.6.0"}
}
#
# create database "mvcc"
curl -X PUT http://gandalf:5984/mvcc
# response:
{"ok":true}

User 1 inserts an initial version of a document into the database. The document is stored in file mydoc_u1_1.json and looks like this:
{
  "content": "U1_1"
} 


curl -H "Content-Type: application/json" -d @mydoc_u1_1.json -X PUT http://gandalf:5984/mvcc/mydoc
# response:
{"ok":true,
 "id":"mydoc",
 "rev":"1-3557461c60a30b0d156f8b36a1bdcf9f"
}


User 2 reads the document and takes down the revision in order to use it for the update he plans.
# User 2 reads the doc...
curl -X GET http://gandalf:5984/mvcc/mydoc
# response:
{"_id":"mydoc",
 "_rev":"1-3557461c60a30b0d156f8b36a1bdcf9f",
 "content":"U1_1"
}


Both users are now holding the same revision of the document and both plan to update the document. User 1 is faster and places his update.
# here is the updated doc (mydoc_u1_2.json)
{
  "_rev":"1-3557461c60a30b0d156f8b36a1bdcf9f",
  "content": "U1_2"
}
#
# ...and the update...
curl -H "Content-Type: application/json" -d @mydoc_u1_2.json -X PUT http://gandalf:5984/mvcc/mydoc
# response:
{"ok":true,
 "id":"mydoc",
 "rev":"2-2686fb85c0681a3d8c411617f048f94f"
}


Done. We hava a second revision of the document. User 2 will now submit his update, but he still holds revision 1. Here is his update.
# here is the document...
{
  "_rev":"1-3557461c60a30b0d156f8b36a1bdcf9f",
  "content": "U2_1"
}
#
# Note that it indeed references the 1 revision of the document
# Now the update itself:
curl -H "Content-Type: application/json" -d @mydoc_u2_1.json -X PUT http://gandalf:5984/mvcc/mydoc
# response: 
{"error":"conflict",
 "reason":"Document update conflict."
}


Here we see the expected result: You are not allowed to update a document if you do not have the latest revision. Another way of saying this is, you can only update the latest revision of a document or slightly different again, you cannot branch the document. At least not in single document update mode.
User 2 may be a bit slow, but he is resourceful. He knows about couchdb's bulk update interface and that this is a way to fork a branch from revision 1. So here is what he does:
 # this is the bulk doc (bulk_u2_1.json): 
{
"docs": [{
  "_id": "mydoc",
  "_rev":"1-3557461c60a30b0d156f8b36a1bdcf9f",
  "content": "U2_1"
}]
}


Granted, this is some sorry bulk file, consisting only of a single document...
# user 2 tries the bulk interface:
curl -H "Content-Type: application/json" -d @bulk_u2_1.json -X POST http://gandalf:5984/mvcc/_bulk_docs
# response: 
[{"id":"mydoc",
  "error":"conflict",
  "reason":"Document update conflict."
}]


Same result as before. Using the bulk interface is not enough. It has to be used with the all-or-nothing option. This is what user 2 tries next.
Now the bulk document contains the all_or_noting property.
# bulk-doc: bulk_u2_2.json
{
"all_or_nothing": true,
"docs": [{
  "_id": "mydoc",
  "_rev":"1-3557461c60a30b0d156f8b36a1bdcf9f",
  "content": "U2_1"
}]
}
#
# ...and now the update:
curl -H "Content-Type: application/json" -d @bulk_u2_2.json -X POST http://gandalf:5984/mvcc/_bulk_docs
# response:
[{"ok":true,
  "id":"mydoc",
  "rev":"2-ba85ce56711c69f7d6200935357d79f9"
}]


Success: this time, the update was accepted. We now have one root-revision and two revisions branching from that root revision:
# the revision tree:
root:     1-3557461c60a30b0d156f8b36a1bdcf9f
branch 1:   2-2686fb85c0681a3d8c411617f048f94f
branch 2:   2-ba85ce56711c69f7d6200935357d79f9


Now that we finally have a conflict, how does couchdb deal with it? Let's simply retrieve the document and see what we get.
# a simple get...
curl  http://gandalf:5984/mvcc/mydoc
# response:
{"_id":"mydoc",
 "_rev":"2-ba85ce56711c69f7d6200935357d79f9",
 "content":"U2_1"
}


Couchdb determines a "winner" and does not let the conflict surface as long as you do not specifically ask for it.
Let's ask for it.
# fetch current document and all conflicting revisions...
curl  http://gandalf:5984/mvcc/mydoc?conflicts=true
# response:
{"_id":"mydoc",
 "_rev":"2-ba85ce56711c69f7d6200935357d79f9",
 "content":"U2_1","_conflicts":["2-2686fb85c0681a3d8c411617f048f94f"]
}


Couchdb presents the revision inserted by user 2 as the winning revision. The version introduced by user 1 appears in the conflicts list.
User 1 may not be aware of the fact that his revision is no longer in favor. He continues to update his branch of the document.
# user 1 updates his branch of the document (mydoc_u1_3.json)
{
  "_rev":"2-2686fb85c0681a3d8c411617f048f94f",
  "content": "U1_3"
}
#
# here is the update:
curl -H "Content-Type: application/json" -d @mydoc_u1_3.json -X PUT http://gandalf:5984/mvcc/mydoc
#response
{"ok":true,
 "id":"mydoc",
 "rev":"3-627f10af94aaf3f31a20c9277c68219a"}


No problem with this update. This means that once a document is branched, each branch can be updated in its own right. In our case the branch user 1 maintains is now one revision longer than the branch maintained by user 2. Let's see what this means in terms of conflicting documents and which branch couchdb now elects to be the winner.
We do a regular GET with the conflicts option enabled.
# GET the winning revision and all conflicting revisions:
curl  http://gandalf:5984/mvcc/mydoc?conflicts=true
# response:
{"_id":"mydoc",
 "_rev":"3-627f10af94aaf3f31a20c9277c68219a",
 "content":"U1_3","_conflicts":["2-ba85ce56711c69f7d6200935357d79f9"]
}


We can conclude two things from the result of this GET. One is that the winning branch has changed. The branch of user 1, which has the highest revision number, is now the winner. Another thing to notice is that the conflict moved up the document tree into its leaves.

Summary

Short summary on "Conflicts on a Single Couchdb Instance":

  • It's not that easy to produce a conflict on a single instance
  • Once you have one, you are free to ignore it, couchdb will always decide on a winning revision
  • In spite of couchdb picking a winner, you are free to follow and work on any branch you please
  • With every change on any branch, the dice are rolled again an a new winner may turn up
That's it for now on working with a single instance. The next entry will deal with two instances (running on two Pies of course :-) and master-master replication between them.