Thursday, August 9, 2012

Recovery Strategy for Polyglot Persistence


Typically all data for a given application is stored in a single DBMS. Then the usual data maintenance tasks include data backups on a predefined schedule, and testing disaster recovery procedures.  The potential loss of data could be dealt with. The data could be recovered to a point in time right before the failure.  If not, it would be recovered from the most recent successful backup copy.  With the above measures it is always possible to tell the time period the data was lost for, and ask users to re-do their work.

The data administration gets more complicated for a heterogeneous persistence layer.  With multiple data stores, there is an additional concern of data inconsistency across data stores. 

Our application utilized three different data stores:
  •  SQL Server tables
  •  MongoDB collections
  •  Uploaded files (pdf and such) on the file system
Data was interconnected pair-wise: SQL Server and files, SQL Server and MongoDB.

The dependency between SQL Server and files was simple and it was easy to overcome. The uploaded files were referenced from the SQL Server tables by path/name. I.e. the tables included pointers to the data files.  As files should be rarely deleted, the hard deletes were eliminated and replaced with soft deletes.  That meant that as long as the file backup was taken any time later than the SQL Server backup, the recovered data would be consistent.

The interdependency between SQL Server and MongoDB was more complicated.  For a given transaction data might get stored in SQL Server and overlapping data might go to MongoDB.  This poses problems for a possible recovery
  • When recovering to a point in time right before the failure, the most recent transactions might not have been written to the MongoDB journal, but were written to the SQL Server log (and vice versa). So when recovered they might be somewhat off.
  • When recovering from a backup snapshot it would actually require restoring from two backups: SQL Server backup and MongoDB backup.   The snapshots are run independently and might be a couple of minutes off.

MongoDB has good durability in a recommended configuration, i.e. when utilizing journaling and replication. In initial configuration the journaling was enabled, but not the replication. Without replica, the backups had a rolling problem effect: if data is changed over the course of a backup then the resulting snapshot may be in an inconsistent state that doesn't correspond to how the data looked in the MongoDB at any one moment.   So the first priority was enabling the replication. It added fail over capability, significantly reducing changes of a disaster.  And the backups used –oplog option for precise backup timing.

Yet another trick was used on the SQL Server side.  Its recovery procedures are more sophisticated.  And the database can be recovered to any point in time.  So whenever the database backups had slightly different times, the SQL Server recovery was tuned to match the MongoDB timing.

1 comment:

  1. repair for sql server crashed by malware, errors in transferring via Internet, incorrect user actions, hard drive failures. Tool restores system/user tables, stored procedures, functions, etc. data.

    ReplyDelete