Add-on High Availability Cluster: Documentation
The add-on ensures the continuation of content creation in case of planned updates or an unlikely crash of Sophora’s master server without any breaks.
Table of Contents
- General Changes
- Getting started
- Clients connecting to a cluster node
- Switching the master
- ReadAnywhere cluster configuration
- Cluster for Sophora Delivery Web Application
The Sophora server can be operated as a cluster. A cluster consists of several Sophora servers connected to each other. Each cluster node operates in one of two modes: master or slave. The master replicates all of its data to the slave nodes of the cluster. The clients (e.g. staging slaves and DeskClients) only connect to the master node of the cluster. The master can easily be moved to another node of the cluster. The previous master node will then switch to slave mode. The move is transparent to connected clients, they will automatically reconnect to the new master. Each client keeps a list of known cluster nodes; on startup, the client connects to each node until it finds the master.
No persistent queues
In former versions for every slave connected to a master, a persistent JMS queue was created. When the slave was down, all changes in the master were stored in this queue. When the slave was restarted all changes were transmitted to the slave.
Since version 1.38 there are no persistent queues. Slaves can dynamically connect to and disconnect from the master. The configuration properties sophora.replication.master.queueNames and sophora.replication.slave.queueName are obsolete.
When a slave reconnects to a master it, synchronizes its content in a similar way as the staging slaves do. Based on the last modification date of documents in the repository in the slave, a sync-request is sent to the master. Then, the master will send all changed documents and structure nodes to the server. For documents, all missing versions are transmitted to the slave. All other data (node type configurations, users, roles, etc.) are send to the slave, too.
To start a Sophora cluster, at least two Sophora servers in cluster mode are needed. In every cluster node a JMS broker is running. Therefore, the configuration differs from the configuration of a normal Sophora server:
Cluster Node 1
sophora.home=... sophora.rmi.servicePort=1198 sophora.rmi.registryPort=1199 sophora.replication.mode=cluster sophora.local.jmsbroker.host=hostname1 sophora.local.jmsbroker.port=1197 sophora.remote.jmsbroker.host=hostname2 sophora.remote.jmsbroker.port=1297 sophora.replication.slaveHostname=hostname1
Cluster Node 2
sophora.home=... sophora.rmi.servicePort=1298 sophora.rmi.registryPort=1299 sophora.replication.mode=cluster sophora.local.jmsbroker.host=hostname2 sophora.local.jmsbroker.port=1297 sophora.remote.jmsbroker.host=hostname1 sophora.remote.jmsbroker.port=1197 sophora.replication.slaveHostname=hostname2
The configuration of the remote JMS broker (sophora.remote.jmsbroker.*) must direct to one of the other cluster nodes. It is used to initially find the master. It is not required that the remote server is always the master.
./sophora.sh -start -clusterMode master
Valid values for the system property are:
- open (The server which starts first becomes the master server)
When a cluster with two or more nodes is running, the list of cluster nodes are exchanged and stored in the repository of every connected Sophora server. After a restart of one of the Sophora servers this list is used to find the master. All possible servers from this list are connected and checked. When a remote server from this list is not running or is not in master mode, the next server from the list is checked.
The list is accessible via JMX. Entries of staging slaves are removed, when the respective staging slave is stopped. Other nodes (master and slaves) are stored permanently. When a Sophora server leaves the cluster permanently, the server can be removed from the list via JMX. When this is done on the current master, this information is transmitted to all currently connected Sophora servers.
When a cluster node is started it tries to find the other cluster nodes:
- The configured remote JMS broker is asked if it is running as master.
- If the configured remote JMS broker is not running as master or the JMS broker is not reachable, all other known JMS brokers (stored in the repository) are tried.
- If none of the other servers is running as master, the currently started server becomes the master. Otherwise the server starts as a slave and synchronizes its content with the master.
Every server has a unique server id. This id is stored in a file <sophora.home>/data/server.properties. When copying the home directory to create a new Sophora slave, it is important to not copy this file. The current server id can be checked via JMX.
Clients connecting to a cluster node
For clients it is only possible to connect to the current master (or a staging slave)
For users with the system permission 'administration' it is possible to connect to a slave by using a special extension to the connection URL. By adding '?force=true' it is possible to connect a client (rich client or other clients like importer) to any Sophora server independent from its current state (master, slave)
When a client is connecting with the force flag, it will never switch automatically to a different Sophora server.
When a client tries to connect to a slave, this situation is detected and the client is redirected to the master automatically. In a situation where the slave is running and connected to the cluster, the list of all Sophora server is sent to the client and used for finding the right master. When a client tries to connect to a slave which is currently not running, a different approach is used. Every time a client successfully connects to a Sophora cluster, the list of all cluster nodes is sent to the client and stored persistently in the file system. Later, when a client tries to re-connect to the cluster and the configured server is not reachable, the list of servers is considered to find the current master. Please note that the directory where the server information are stored in a xml file must be configured in all sophora clients (Indexer, Importer, Linkchecker). If it is not configured this feature is deactivated. The directory is configured in all clients via the property sophora.client.dataDir. The deskclient saves this file without any further configuration in its workspace.
Switching the master
In a cluster the master can switch to another Sophora server. Currently, this switch can be done only manually. No automatic switch is possible. Even when the current master is stopped, no new master is elected automatically.
The switch can be started via JMX on any cluster node. A SwitchMasterRequest is sent to all Sophora servers and a ClientReconnectRequest is sent to all clients which are currently connected to the old master.
The new master is available immediately. All other cluster nodes (including the old master) connect to the new master and synchronizes their content.
All clients log out off the old master and re-login to the new master automatically. No further operation is needed. All operations in the clients are blocked (at the ISophoraContent interface) while the client is connecting to the new master. When the client is connected to the new master the blocked operations continue.
ReadAnywhere cluster configuration
A sophora cluster can operate in a ReadAnywhere mode. In this setting read-only requests from clients are distributed to slaves over the cluster. This approach reduces the number of requests, which are directed to the master server and reduces therefore the load of the master and increases it's performance. Note that the master does not act as a proxy for such requests, rather the clients communicate directly with the slaves.
The ReadAnywhere mode has to be enabled globally in a configuration document. Alternatively it can be enabled by a connecting client with the help of an additional connection url parameter.
The additional connection parameter may be used to test the ReadAnywhere cluster, while the configuration parameter in the configuration document enables the ReadAnywhere cluster for all connected clients. At which the benefits of the ReadAnywhere cluster come at first fully into effect when all connecting clients are using it.
The configuration parameter in the configuration document is the following one:
With the additional connection parameter ?readanywhere=true is the ReadAnywhere cluster functionality enabled just for one client. Take the following connection URL as an example:
Exclusion of Servers from a ReadAnywhere cluster
There may be servers in a sophora cluster which should not be used in a ReadAnywhere cluster. For example Server with a weaker hardware or server which are only used for backup purposes. The server can be excluded from a ReadAnywere cluster with the configuration parameter (in the sophora.properties file)
These servers act as normal slaves, but no requests are directed to them. The default value of this parameter is true. This configuration parameter is also available over the JMX MBean ContentManager. There is also an operation toggleReadAnywhereStatus() to toggle the value.
Cluster for Sophora Delivery Web Application
In normal setups every Sophora Delivery Web Application is connected to its own Sophora staging slave, which serves the data only for this delivery. The connection is configured in the sophora.properties of the web application and can only be changed after a restart:
sophora.serviceUrl=rmi://localhost:1199/ContentManager sophora.delivery.client.username=webadmin sophora.delivery.client.password=admin
The Sophora high availability cluster brings a more flexible connection between staging slaves and web applications. It allows two new features:
- It is possible to manually change the connection to the staging slave.
- The web application can automatically change the connection, when the current staging slave is no longer available.
Manually changing the connection to another staging slave
The JMX interface of the web application offers a new section "com.subshell.sophora:Type=Server" which lists all available staging slaves. When the master of the Sophora system runs in cluster mode and the web application is currently connected to a staging slave, every JMX entry offers a connectDeliveryToThisServer operation.
Automatically switching to another staging slave
The automatic switching of a Sophora web application to a different staging slave requires two new properties in the sophora.properties:
The first property activates the feature. The second property lists the service urls for alternative staging slaves. These connections are additional to the primary connection (sophora.serviceUrl) which still exists. For all connections the same combination of user name and password is used.
When the current staging slave is not available the list of alternative staging slaves is tested in the configured order. The first available staging slave is choosen and a new connection to this staging slave is established.
After an automatical switch (in contrast to a manual switch) the web application tries to switch back to the preferred staging slave (configured by sophora.serviceUrl). Every 60 seconds the preferred staging slave is probed and if it is available, the web application switches back.