Tag Archives: mongodb

MongoDB finally changes default behavior of writes

MongoDB has finally relented and changed the default behavior for handling errors when writing in the new version of their client. Previously the default behavior was not to wait and see if the write really worked. Unless you explicitly asked for it, a failed write operation might go unnoticed.

MongoDB has gotten a lot of flak for the previous behavior and it has been a big reason for some people to abandon MongoDB in favor of other database solutions. I have never been able to work up much sympathy for users that has been complaining about having lost important data because of not setting the write behavior correctly for their application. Using a database in production without having at least a basic understanding about how it works is a sure way of inviting trouble regardless of which database you use.

But on the other hand I think it is a very good move of MongoDB that they should have done long ago. Having writes fail silently by default is a bad idea that is really counterintuitive for most people. (The reason for the old behavior is clearer after having read the previously mentioned blog post, although the reason for the delay in making this change is not).

I believe a lesson to be learned here is that if you are designing such a system as for instance a database, you should really think twice before choosing speed over safety as default behavior. Most users probably want safety and predictable behavior as default and increased speed as an option, not the other way around. Especially when it is as easy as changing one argument to choose between them..

 

MongoDB read preferences for replica sets

An important, but perhaps sometimes overlooked, parameter in MongoDB when using replica sets is the read preference. This parameter control how reads are handled and is an improvement over the old, (now deprecated), slave_okay-parameter.

By default, all reads are always routed to the primary. This might seem a bit counter-productive at a first glance; wouldn’t it be better that reads be distributed over all instances in the replica set? In many cases that is true but it is important to be aware of the consequences. If a secondary is falling behind the primary for some reason, a read to that secondary could give old data. Depending on your application, this might not be a problem. If you for example are reading log data for a report it might not matter much if the data might be a bit stale.

In previous versions of MongoDB you could set slave_okay to True to distribute reads over the secondaries. This parameter could be set on a connection basis or on operation basis. Starting with version 2.2 of MongoDB you should instead use the parameter read_preference. Like slave_okay it can be set when connecting to the database

import pymongo
conn = pymongo.Connection('localhost', read_preference=pymongo.SECONDARY_PREFERRED)

or just on certain operations

conn = pymongo.Connection('localhost')
conn.blogs.posts.find({'sid': 13214}, read_preference=pymongo.SECONDARY_PREFERRED)

For replica sets, read_preference can take the following values:

  • PRIMARY – This is the default setting and route all reads to the replica set primary. If the primary is unavailable for some reason a read operation would produce an error or exception.
    This is the right setting if it is important to never return stale data.
  • PRIMARY_PREFERRED – Reads are normally sent to the primary, but if it is unavailable operations read from secondary members instead.
    A use case for this might be if you are using MongoDB as backend for a web service that shows some kind of information to a customer. You want to make sure that the information that is shown is up to date but in case of a primary failover you believe it is more important to show some data, stale or not.
  • SECONDARY – Reads are only allowed on secondary members of the replica set. If no secondaries are available a read operation would give an error or exception.
    This might be useful for example if you have a heavy read load but it is important that these read operations never interfere with the write operations.
  • SECONDARY_PREFERRED – Reads are normally routed to a secondary, but if no secondary is available read operations are sent to the primary. (This is how reads are handled when slave_okay is set to True).
    A use case for this is when you are not that concerned with reading stale data and want to distribute the read operations over all set members.
  • NEAREST – Reads are performed on the nearest available set member, disregarding if it is a primary or secondary member. Nearness is determined by periodically sending pings to all members and measuring the response time.
    This could be useful when you have a very read heavy application and want to minimize network latency and do not care if the data might be stale or not.

Note that all preferences other than PRIMARY could give stale data.

Using the default PRIMARY read preference is often to limiting and could in many cases be replaced by at least PRIMARY_PREFFERED. If you for example are using MongoDB as backend for a web service, it might often be better to risk presenting stale data to the frontend then no data at all as could be the case if the primary became unavailable.