Migration Guide

Learn how to migrate from an embedded Solr to SolrCloud using Sophora's Indexing Service.

Sophora in version 4 is able to use the classic embedded Solr Server along with the Sophora Indexing Service in conjunction with a central SolrCloud. Therefore Sophora version 4 can be used to migrate to SolrCloud.

Starting from the Sophora version 5 only the Sophora Indexing Service is supported.

Migration Steps

The Sophora Indexing Service can be used temporarily in parallel to the embedded Solr for migration purposes.

Migration of user-defined index configurations

The user-defined index configurations for the embedded Solr are automatically created and initialized in the Sophora Indexing Service. However, please keep the following restrictions in mind:

A collection with the suffix -live will be created for every user-defined collection. These will only contain published content.
Collections with names ending in -live are ignored by the Sophora Indexing Service.
Collections with the published content only flag (sophora:publishedDocumentsOnly) are also ignored.

To prevent issues going forward with the migration, you should perpare your index configurations:

Migration Guide for custom index configurations
published content only	Ends in `-live`	Migration step
yes	yes	Clone the index configuration. Remove the published content only flag and the -live suffix in one of the index configurations. All your custom solr querys should be redirected to the -live collection.
yes	no	Clone the index configuration. Remove the published content only flag in one of the index configurations and add -live as a suffix to the name of the other one. All your custom solr querys should be redirected to the -live collection.
no	yes	Clone the index configuration. Remove the -live suffix for one of them. All your custom solr querys should be redirected to the base collection.
no	no	Nothing to do.

To rename a collection after cloning it, you have to set the from field changing script "readonly-index-configuration-name-script" offline.

Removed Features in Index Configuration

Reindex Search Query

This feature was used for triggering a partial reindex in the embedded Solr. SISI does not take this field into account when reindexing. For partial reindex of documents use the SISI REST-API.

Remove After Days

This feature is not supported in SISI. The Solr Cloud contains all active and valid documents for each collection at any time. If a temporal restriction is needed, use a filter in the Solr query.

Establish a Deployment and Operation Process

At first, a deployment and operation process should be established for the Sophora Indexing Service and the SolrCloud for all of your environments. The installation and configuration of the Sophora Indexing Service is described here.

Afterwards the Sophora Indexing Services should be tested in a development environment. When the Sophora Indexing Service is started for the first time, it builds up all indexes in the SolrCloud.

Use of the SolrCloud in all Tools

Once the SolrCloud is synchronized, it can be configured within the Sophora Servers, your web applications and tools.

It is sufficient to change the connection parameter to the SolrCloud, since the data in SolrCloud has exactly the same structure as the data in the embedded Solr servers.

The best way to connect to the SolrCloud is the usage of SolrJ. Please make sure, that you use the CloudSolrClient, which can be instantiated with the URL of ZooKeeper.

new CloudSolrClient.Builder(${zookeeper-host}, Optional.empty()).build();

For web applications, special care must be taken. Web applications should always use the appropriate collection type. Please refer to the section "Recommendation: Using Solr's Rule-Based Authorization Plugin to Control Direct Access to Solr" of the Installation & Configuration guide for more information.

Webapps using the Delivery-Framework should set the new properties sophora.delivery.solrcloud.enabled and sophora.delivery.solrcloud.zkHosts. See also the delivery configuration for further details.

Please refer to the section "SolrCloud" of the documentation of the Sophora Server to find out how to configure the central SolrCloud.

Scripting

The Sophora indexing scripts are mostly backward compatible. Since they granted access to the IContentManager interface, the set of available methods has been enormous. The Sophora Indexing Service does support most, but not all of these methods. Whenever a script calls an unsupported method, the indexing service will log an exception of type NotAllowedInScript with the message "This method is not allowed in an indexing script."
We don't expect this exception to show up in real-world scenarios but if you see this message you must adjust your script.
Additionally, the usage of the IContentManager is deprecated. The ISophoraClient should be used instead. However, as the embedded Solr indexer does not support access via ISophoraClient, changing from content manager to client is only possible after the Indexing in the Sophora Server has been stopped and when the embedded Solr Server is shut off.
Please refer to the corresponding section in the documentation about indexing scripts.

Common Pitfalls of the Script Migration

There is one particular case where the new indexing service behaves differently compared to the embedded Solr indexing.
Consider you have a published document that references another, non-published document. With the embedded Solr running on a staging server, retrieving this reference in a script results in an external reference being returned, because the referenced document does not exist on the staging server.
The indexing service, however, connects to the primary Sophora server, where this document exists. Thus, retrieving this reference in a script (that runs for a live collection) returns a valid reference. Only when trying to load the referenced document (e.g., via getDocumentByUuid), null is returned (because the indexing service internally modifies those calls only to return published content).
Consequently, just checking for UUID_REFERENCES in an indexing script is not sufficient any more. You have to additionally check whether the document is null:

IValue value = childNode.getProperty("sophora:reference").getValue();
if (value.getType().equals(PropertyType.UUID_REFERENCE)) {
    INode referencedDocument = sophoraClient.getDocumentByUuid(value.getUUID());
    if (referencedDocument != null) { // Additional null-check to ensure there is a published version of the document.
        // Do something with the referenced document here.
    }
}

This is fully backwards-compatible with the embedded Solr indexer, i.e., you can already apply those script modification before migrating to the new indexing service.

Migration of the Offline Collection

The migration of the offline collection is described here.

Turn off the Embedded Solr Server

Once the Sophora Indexing service and the SolrCloud have been successfully tested, the internal Solr indexer can be disabled.

To do so, shut down the Sophora Server and set its configuration options sophora.solr.indexer.enabled and sophora.solr.embedded.enabled to false. After restarting the server, the embedded Solr indexer is disabled.

Once the embedded Solr is turned off, the deprecated start script sophora.sh should replaced by the new start script sophoraServerControl.sh, especially the option -start_and_wait of the deprecated start script sophora.sh wont work properly. For further information see here.

When the migration was successful and you do not want to enable the embedded Solr again, you may delete the persistent state of the embedded Solr engine (directories solr, data/solr, and data/solr-indexing-queues). (The embedded Solr may still be re-enabled after deleting its data, but a full reindex has to be performed in that case.)

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.