Log in



Tags » ‘High Availability’

VMotion and Database Availability Groups (DAG)

August 5th, 2010 by Scott

Over the past few weeks I have been busy deploying Exchange 2010.  Many of these deployments involved deploying Exchange 2010 on VMWare ESX environments.   In a previous post I mentioned how Vmotion and Exchange 2010 DAGs was not supported.  However, based on some testing I have changed my opinion of this approach.  Perhaps it is just a fluke and I would be interested to hear what everyone out there is experiencing but here goes.

When leveraging DAG on Exchange 2010 and the need to VMotion the server comes up there seems to be an issue with doing a migrate when the Exchange 2010 mailbox server is powered off for the Vmotion.  I have seen this occur at two separate client sites.  Power off the Mailbox Server, Migrate, DAG broken.  Got it?

Now, what I am seeing that if I leave the Passive node online and replicating the passive databases with the active node a live migrate seems to work just fine!  I have tested this twice now (which is why I don’t know if it is a fluke or not) and both times migrating the passive node while powered on and replicating caused NO problems.  I even went as far as to reboot the passive machine anticipating it to break, but nope, nothing! 

So, I’m going to challenge the community out there to try this if you can.  Live migrate your passive VM Mailbox node and see what happens.  If you can, power it off and migrate and see what happens.  I seem to have the same occurrence when the VM is powered down and/or when the VM is powered on. 

Either way, from my testing it appears that if you migrate while the DAG member is online (keep in mind I didn’t have any active databases running on the node) it seems to successfully migrate without any problems!

Let me know what your findings are, but this is great news!

Managing Split Brain in Exchange 2010 DAG with Datacenter Activation Coordination Mode

November 18th, 2009 by Scott

While in my Exchange 2010 ignite class we came across a new feature of DAG called Database Availability Groups (DAG).  DAG is a great way to provide high availability and redundancy in an Exchange 2010 environment.  DAG’s are basically replacing the Exchange 2007 features known as LCR, CCR, SCR, and SCC. 

One consideration for leveraging a DAG is placing mailbox servers in different datacenters and replicate the data over the wire.  This can be accomplished with DAG and was accomplished in Exchange 2007 using a geo-distributed CCR setup.  One concern however is a split brain occurrence.  Say for example you have two datacenters in your organization.  Datacenter A has 2 nodes of the DAG plus the File Share Witness (FSW) and datacenter B has two DAG server nodes.  If the primary datacenter (Datacenter A) should happen to lose power and the DAG is activated in Datacenter B those two servers now are primary.  However, when the primary datacenter is restored, Datacenter A, and say for example the network between the two sites has not been restored, this is then potential for a split brain.  This is because when Datacenter A comes back on line it sees the FSW and has 3 votes for quorum.  Two from the DAG and one from the FSW.  Datacenter B believes it is in charge and remains active.  Now both datacenters believe they are authoritative for the DAG. 

In order to remedy this problem in Exchange 2010 a new feature has been developed called Datacenter Activation Coordination (DAC).  DAC is used to control the activation behavior of DAG nodes that may be split between multiple datacenters.  Basically what occurs here is that when there is an outage in a datacenter other members of the DAG will come on line in another datacenter.  When the DAG nodes that are offline return to service the offline DAG nodes will leverage a protocol called Datacenter Activation Coordination Protocol (DACP) before trying to mount their databases.  The DACP is used to determine the current state of the DAG and whether Active Manager should try to mount the databases or not. 

Now you may be wonder, what is Active Manager?  Well, Active Manager stores a bit in memory (either a 0 or a 1) that tells the DAG whether it’s allowed to mount local databases that are assigned as active on the server. When a DAG is running in DAC mode (which would be any DAG with three or more members), each time Active Manager starts up the bit is set to 0, meaning it isn’t allowed to mount databases. Because it’s in DAC mode, the server must try to communicate with all other members of the DAG that it knows to get another DAG member to give it an answer as to whether it can mount local databases that are assigned as active to it. The answer comes in the form of the bit setting for other Active Managers in the DAG. If another server responds that its bit is set to 0, it means servers are allowed to mount databases, so the server starting up sets its bit to 1 and mounts its databases.

So, what this means that if you recover from a failure in the datacenter the DAG nodes must communicate with all other Nodes in the DAG that it is aware of and verify if the databases on that DAG node can be mounted since they all have a DACP bit value of 0.  Once they can verify that no other databases are mounted (setting of 1) then those databases will mount and set their bit to 1. 

Make sense?  I think this is a pretty impressive solution that MS has come up with to prevent the split brain in Exchange 2010.  The kicker?  DAC is disabled by default.  Keep in mind that in order to leverage DAC you need to have at least a 3 node DAG in different datacenters.  I suppose you wouldn’t need this if they are all in the same datacenter and the nodes can communicate with each other. ;)  

If you are looking at deploying a DAG across multiple datacenters you will want to enable DAC.  In order to Enable DAC you can run the following command:

Set-DatabaseAvailabilityGroup –Identity DAGID –DatacenterActivationMode DagOnly

For more information on the ‘Set-DatabaseAvailabilityGroup’ you can go here.

EDIT: This feature will be updated in Exchange 2010 SP 1.  For more information please read my article “Datacenter Activation Coordinator Changes in Exchange 2010 SP1!