MongoDB read preferences for replica sets

An important, but perhaps sometimes overlooked, parameter in MongoDB when using replica sets is the read preference. This parameter control how reads are handled and is an improvement over the old, (now deprecated), slave_okay-parameter.

By default, all reads are always routed to the primary. This might seem a bit counter-productive at a first glance; wouldn’t it be better that reads be distributed over all instances in the replica set? In many cases that is true but it is important to be aware of the consequences. If a secondary is falling behind the primary for some reason, a read to that secondary could give old data. Depending on your application, this might not be a problem. If you for example are reading log data for a report it might not matter much if the data might be a bit stale.

In previous versions of MongoDB you could set slave_okay to True to distribute reads over the secondaries. This parameter could be set on a connection basis or on operation basis. Starting with version 2.2 of MongoDB you should instead use the parameter read_preference. Like slave_okay it can be set when connecting to the database

import pymongo
conn = pymongo.Connection('localhost', read_preference=pymongo.SECONDARY_PREFERRED)

or just on certain operations

conn = pymongo.Connection('localhost')
conn.blogs.posts.find({'sid': 13214}, read_preference=pymongo.SECONDARY_PREFERRED)

For replica sets, read_preference can take the following values:

  • PRIMARY – This is the default setting and route all reads to the replica set primary. If the primary is unavailable for some reason a read operation would produce an error or exception.
    This is the right setting if it is important to never return stale data.
  • PRIMARY_PREFERRED – Reads are normally sent to the primary, but if it is unavailable operations read from secondary members instead.
    A use case for this might be if you are using MongoDB as backend for a web service that shows some kind of information to a customer. You want to make sure that the information that is shown is up to date but in case of a primary failover you believe it is more important to show some data, stale or not.
  • SECONDARY – Reads are only allowed on secondary members of the replica set. If no secondaries are available a read operation would give an error or exception.
    This might be useful for example if you have a heavy read load but it is important that these read operations never interfere with the write operations.
  • SECONDARY_PREFERRED – Reads are normally routed to a secondary, but if no secondary is available read operations are sent to the primary. (This is how reads are handled when slave_okay is set to True).
    A use case for this is when you are not that concerned with reading stale data and want to distribute the read operations over all set members.
  • NEAREST – Reads are performed on the nearest available set member, disregarding if it is a primary or secondary member. Nearness is determined by periodically sending pings to all members and measuring the response time.
    This could be useful when you have a very read heavy application and want to minimize network latency and do not care if the data might be stale or not.

Note that all preferences other than PRIMARY could give stale data.

Using the default PRIMARY read preference is often to limiting and could in many cases be replaced by at least PRIMARY_PREFFERED. If you for example are using MongoDB as backend for a web service, it might often be better to risk presenting stale data to the frontend then no data at all as could be the case if the primary became unavailable.