Scott Feltmann's Blog

Managing Split Brain in Exchange 2010 DAG with Datacenter Activation Coordination Mode

by Scott on Nov.18, 2009, under Exchange Server, Microsoft Related

While in my Exchange 2010 ignite class we came across a new feature of DAG called Database Availability Groups (DAG).  DAG is a great way to provide high availability and redundancy in an Exchange 2010 environment.  DAG’s are basically replacing the Exchange 2007 features known as LCR, CCR, SCR, and SCC. 

One consideration for leveraging a DAG is placing mailbox servers in different datacenters and replicate the data over the wire.  This can be accomplished with DAG and was accomplished in Exchange 2007 using a geo-distributed CCR setup.  One concern however is a split brain occurrence.  Say for example you have two datacenters in your organization.  Datacenter A has 2 nodes of the DAG plus the File Share Witness (FSW) and datacenter B has two DAG server nodes.  If the primary datacenter (Datacenter A) should happen to lose power and the DAG is activated in Datacenter B those two servers now are primary.  However, when the primary datacenter is restored, Datacenter A, and say for example the network between the two sites has not been restored, this is then potential for a split brain.  This is because when Datacenter A comes back on line it sees the FSW and has 3 votes for quorum.  Two from the DAG and one from the FSW.  Datacenter B believes it is in charge and remains active.  Now both datacenters believe they are authoritative for the DAG. 

In order to remedy this problem in Exchange 2010 a new feature has been developed called Datacenter Activation Coordination (DAC).  DAC is used to control the activation behavior of DAG nodes that may be split between multiple datacenters.  Basically what occurs here is that when there is an outage in a datacenter other members of the DAG will come on line in another datacenter.  When the DAG nodes that are offline return to service the offline DAG nodes will leverage a protocol called Datacenter Activation Coordination Protocol (DACP) before trying to mount their databases.  The DACP is used to determine the current state of the DAG and whether Active Manager should try to mount the databases or not. 

Now you may be wonder, what is Active Manager?  Well, Active Manager stores a bit in memory (either a 0 or a 1) that tells the DAG whether it’s allowed to mount local databases that are assigned as active on the server. When a DAG is running in DAC mode (which would be any DAG with three or more members), each time Active Manager starts up the bit is set to 0, meaning it isn’t allowed to mount databases. Because it’s in DAC mode, the server must try to communicate with all other members of the DAG that it knows to get another DAG member to give it an answer as to whether it can mount local databases that are assigned as active to it. The answer comes in the form of the bit setting for other Active Managers in the DAG. If another server responds that its bit is set to 0, it means servers are allowed to mount databases, so the server starting up sets its bit to 1 and mounts its databases.

So, what this means that if you recover from a failure in the datacenter the DAG nodes must communicate with all other Nodes in the DAG that it is aware of and verify if the databases on that DAG node can be mounted since they all have a DACP bit value of 0.  Once they can verify that no other databases are mounted (setting of 1) then those databases will mount and set their bit to 1. 

Make sense?  I think this is a pretty impressive solution that MS has come up with to prevent the split brain in Exchange 2010.  The kicker?  DAC is disabled by default.  Keep in mind that in order to leverage DAC you need to have at least a 3 node DAG in different datacenters.  I suppose you wouldn’t need this if they are all in the same datacenter and the nodes can communicate with each other. ;)  

If you are looking at deploying a DAG across multiple datacenters you will want to enable DAC.  In order to Enable DAC you can run the following command:

Set-DatabaseAvailabilityGroup –Identity DAGID –DatacenterActivationMode DagOnly

For more information on the ‘Set-DatabaseAvailabilityGroup’ you can go here.

:, , , ,

Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!