Skip Ribbon Commands
Skip to main content
Mark E. Smith's Brain Dump > Posts > Mailbox Moves, Event ID 1100, and StalledDueToHA
June 10
Mailbox Moves, Event ID 1100, and StalledDueToHA

This week a customer called me stating that their mailbox moves were not working. After asking the customer to send me the output of Get-MoveRequestStatistics from the affected user, we saw the error:

Move for mailbox '/o=First Organization/ou=First Administrative Group/cn=Recipients/cn=johndoe' is stalled because DataMoveReplicationConstraint is not satisfied for the database 'Database' (agent MailboxDatabaseReplication). Failure Reason: Database 57d58164-48e7-4556-8407-5ef959e0c512 does not satisfy constraint SecondCopy. Some database copies are behind.

You’ll also see Event ID 1100, Source MSExchange Mailbox Replication.

I then asked the customer to send me the output of Get-MailboxDatabaseCopyStatus * at which time we found that the target database (which had three copies – two in Site A and one in Site B) had one failed copy in Site A and a copy queue length of 5230 logs in Site B).

Further inspection of the Move Request Stats shows that the status of the move hasn’t failed, rather it’s StalledDueToHA. So, what’s Exchange doing here, we’ll it’s attempting to save your neck because you’re getting close to a state where you’re down to one healthy copy of the data so it’s preventing you from moving more mailboxes to the problematic database.

How does this work? Well, when the target database is a replicated database in a DAG, the MRS regularly checks the replication health of the target database. High availability infrastructure verifies the current replication health against the configured throttling behavior for high availability mailbox moves (as specified by the DataMoveReplicationConstraint parameter) for the target database. Depending on the results, MRS will either continue with the move or wait. If the target database isn't healthy for 30 minutes, MRS will fail. In the specific case of StalledDueToHA this means that the CopyQueueLength is > 10 or the ReplyQueueLength > 50.

So, how do we resolve this? Well first let’s get that failed second copy of the database addressed. After a restart of the Replication Service on the affected server, some hurry up and wait, and a reseed of the Catalog, we were now back up to two healthy copies in Site A and a copy queue length of 2283 logs on the copy in Site B. We’ve now satisfied the constraint for MRS as we have two healthy copies but what else could we have done?

Well, according to Technet we have three options:

1.       Remove the move request and then move the mailbox to a healthy target database.

2.       Resolve the issue with the target database's replication and resume the move request.

3.       Update the DataMoveReplicationConstraint parameter on the Set-MailboxDatabase cmdlet for the target database to reflect its current state (which is what I would NOT recommend).

But let’s take a closer look at #3 here. First, as I stated I would not take this option because, we’ll you’re putting your data (and possibly job at risk), but maybe your copies are on RAID vs. JBOD and you’ve got no other choice. In this case you can set the DataMoveReplicationConstraint parameter on the Set-MailboxDatabase cmdlet to NONE which disables MRS from coordinating with Active Manager. This will lift the constraints and let you continue the move and run that CopyQueue up.  I would highly recommend that if you choose this option that after the mailbox move has completed you set DataMoveReplicationConstraint back to your previous value (the Default is SecondCopy) so that Exchange can keep trying to protect your data.

 

Comments

There are no comments for this post.