Replication and Synchronization

Table of Contents

Introduction

This document explains best practices on how to use replication and synchronization. Replication means the transfer of changes on the master to any connected slave or staging slave as long as all affected systems are running. The term synchronization is used when one server is outdated and must be synchronized with the master server.

Please refer to the Sophora Server documentation before you read this documentation. Particularly the sections about the configuration and the different server modes are important. The documentation about Backup and Recovery  might also be useful.

The slaves and the staging slaves are connected to the Sophora master server. You can connect any number of slaves and staging slaves.

Replication

The replication between the master and a slave is done via a persistent JMS Queue. Every connected slave has its own queue. In the consequence for every editorial or administrative change on the master server, a replication message is added to every replication queue. This replication message is deleted only when it is read by the corresponding slave. So when the slave has a downtime, it must simply be started and it connects automatically to its queue and starts to process the replication messages. As every change on the master is performed on the slaves as well, the slaves are an exact copy of the master *. The version history of the documents are completely replicated. In a running environment the queue is almost always empty, and a change on the master is processed on the slaves after seconds. The time duration that is required to process a replication message by a slave increases with a growing queue. For example after a downtime of the slave the queue may contain several thousands of replication messages. New replication messages are processed only after all pending replication messages have been processed by the slave.

* The permanent deletion of trashed documents will just take place on the current server and will not be replicated to other servers! You have to perform this operation manually on all slaves where necessary. Thus, delete documents permanently only if you know what you are doing!

Slave

There is no special configuration needed on the master server. 

Configuration of the Replication between Master and Slave

Let's assume a scenario with two slaves (slave A and slave B). The following configuration is necessary to configure the replication.

Excerpt of the Configuration of the master

sophora.replication.mode=master
sophora.local.jmsbroker.host=master.hostname
sophora.local.jmsbroker.port=1197

Excerpt of the Configuration of Slave A

sophora.replication.mode=slave
sophora.remote.jmsbroker.host=master.hostname
sophora.remote.jmsbroker.port=1197

Excerpt of the Configuration of Slave B

sophora.replication.mode=slave
sophora.remote.jmsbroker.host=master.hostname
sophora.remote.jmsbroker.port=1197

Deleting or purging a replication queue

The persistent replication queues are stored in a Derby database in the Sophora master installation. The database is located in the folder <sophora.home>/data/jms. If all replication queues should be deleted, the whole directory can be deleted. Furthermore it is possible to purge (in JMS terminology to delete) only selected queues over JMX. This is done with the MBean org.apache.activemq:BrokerName=embedded,Type=Queue,Destination=<QUEUE-NAME> of the master server and its operation purge().

Monitoring a replication queue

The most important indicator for the replication is the size of the queue. The size may be used to estimate the delay of a replication. This information is available as the attribute size of the MBean org.apache.activemq:BrokerName=embedded,Type=Queue,Destination=<QUEUE-NAME> of the master server. One must keep in mind that one replication message is not a synonym for the replication of one document.  A replication message may be any arbitrary change on the master e.g. a new user, a changed role or just an altered document.  

Replication in a failover scenario

In a failover scenario the master is for whatever reason in an unusable state. So typically one of the connected slaves overtakes the role of the new master server. For this purpose the configuration of the slave must be changed to the configuration of the master (e.g. the server mode and the names of the replication queues).

If the master server breaks, unprocessed replication messages in the replication queues are lost. Since replication messages are processed very quickly, this is an unusual situation. If the server does not start any more, it can neither start the JMS broker for the connected slaves.  As the version history is continued on the new master server, the replication queues must be deleted on the original master. 

When a slave (on server A) adopts the role of the master (on server B), the "temporary master" (on server A) maintains the replication queues for additional slaves and for the temporally unavailable original master (on server B). After the original master (on server B) is ready for use again, it is connected to the  "temporary master" (on server A) as a slave and processes its queue which was maintained in the meantime. When the queue is empty the two servers are synchronous again, and their roles may be switched again. 

Replication between Master and Staging Slave

The master communicates via a JMS Topic with each connected staging slave. Staging slaves are connected automatically with the master.

Configuration of the Replication between Master and Staging Slave

There is no special configuration needed on the master server. 

Excerpt of the Configuration of the Staging Slave

sophora.replication.mode=stagingslave
sophora.remote.jmsbroker.host=master.hostname
sophora.remote.jmsbroker.port=1197

Synchronization

In order to start a synchronization the property sophora.replication.restartDate must be configured on a slave or staging slave. The communication is done via a non persistent temporary queue. If one of the involved server is restarted during the synchronization, the temporary queue is deleted and the synchronization must be started again from the beginning. The progress of the synchronization may be monitored in the log file of the slave or staging slave. When the message got syncFinished occurs in the log file, the synchronization is completed.

The master server must determine  all the changes for  an incoming sync request. This step may take several minutes, and the progress of this step can not be monitored. 

Synchronization between Master and Slave

If for whatever reason processing the replication queue is not sufficient, it is possible to synchronize a slave. A synchronization is only useful if the desired changes to transfer, are not already stored in the replication queue.  As a synchronization transfers only the working version and the last live version of documents, the version history of synchronized documents on the slave and on the master may not be synchronous. After the synchronization is completed, the slave starts to process its persistent queue. As the queue may contain older versions of a document, the latest version of a document which was transferred during the synchronization, may be temporally overwritten with an older version, after a synchronization. So either the persistent queue should be deleted before a synchronization. Or the queue should be completely processed before the server is used in productive again. Changes which occur during the synchronization are stored in the persistent queue and are processed as well after the synchronization. 

Synchronization between Master and Staging Slave

The synchronization between a master and a staging slave is almost done in the same way like the synchronization between a master a slave. As the staging slave does not have a persistent queue, a staging slave sends always automatically a sync request to the master, when it starts. The publication date of the last published document on the staging slave is used as the restart date. 

Initial Synchronization

When a slave or staging slave with an empty repository is started for the first time, it sends a full sync request to the master server. The communication is done via a non persistent temporary queue. If one of the involved servers is restarted during the synchronization, the temporary queue is deleted and the synchronization must be started again from the beginning. For this purpose the repository of the slave or staging slave must be deleted. 

Specific to Slaves

If the replication queue of the connected slave already exists, it is purged before the synchronization. As the slave requested a full synchronization, it does not need to process older messages in the queue, after the synchronization. Afterwards the master server sends all administrative content and all documents to the slave. As only the working version and the last live version of a document are transferred in synchronization, the resulting slave is not a full copy of the master server. They does not share the version history of documents. If you would like to create an exact copy, the slave should be build out of a backup of the master server. Then the restart date must be set to the time point of the backup. 

Update Issues

In both cases, the replication and the synchronization, the master, the slaves and staging slaves respectively, must be in the same version. But they may have different minor release numbers. For example a master in version 1.35.2 and a slave in version 1.35.1 works, but a master in version 1.35.2 and a slave in version 1.34.14 will not work. So before a version jump the persistent replication queues must be empty. For update issues the replication between the master and staging slaves may be interrupted. The mechanism to interrupt the replication is described here.