Import XML

Referencing Existing Documents

How to link documents that already exist in the repository using Sophora's import XML.

Referencing existing documents by external ID

When importing documents you can also connect them with documents that already exist in the repository.

If you want to update an existing document, simply specify the document external ID as attribute (externalID) of the main <document> element. Note that the nodeType property is not required when updating existing documents.

In the following example an existing document with the external ID "image4711" will be updated by the import:

<?xml version="1.0" encoding="UTF-8"?>
<document externalID="image4711"
          xmlns="http://www.sophoracms.com/import/4.2"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <properties>
    [...]
  </properties>
  <childNodes>
    [...]
  </childNodes>
  <resourceList />
  <fields>
    [...]
  </fields>
  <instructions>
    [...]
  </instructions>
</document>

If you wish to reference an existing document from the repository as a childnode, provide its external ID in the corresponding reference property of the childnode. To include an existing image (ID="image4711") in a newly created image gallery, the XML should contain the following snippet:

<?xml version="1.0" encoding="UTF-8"?>
<document xmlns="http://www.sophoracms.com/import/4.2"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <properties>
    [...]
  </properties>
  <childNodes>
    <childNode nodeType="sophora-content-nt:imageref" name="sophora-content:image">
      <properties>
        <property name="sophora:reference">
          <value>image4711</value>
        </property>
      </properties>
      <childNodes/>
      <resourceList />
    </childNode>
  </childNodes>
  <resourceList/>
  <fields>
    [...]
  </fields>
  <instructions>
    [...]
  </instructions>
</document>

Conditional Import: Importing a document only if it exists in the repository already

Sometimes you may want to import a document only if it exists in the repository already - otherwise the import of the document (and all dependent content in the resource list of the document) should be skipped.

To achieve this goal you can use the optional attribute importOnlyIfDocumentExists (default value: false) which is a direct attribute of the element <document>.

In the following example the documents with the external ids story4711 and story4711-image are only imported, if the external id story4711 exists in the repository already. However, the document with the external id story4811 is imported because it is not enclosed in the resource list of document story4711 but placed on the same XML level (underneath the element <documents>) as document story4711.

<?xml version="1.0" encoding="UTF-8"?>
<documents xmlns="http://www.sophoracms.com/import/4.2"
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <document externalID="story4711"
            importOnlyIfDocumentExists="true">
    <properties>
      [...]
    </properties>
    <childNodes>
      [...]
    </childNodes>
    <resourceList>
      <document nodeType="sophora-content-nt:imageobject"
                externalID="story4711-image">
        [...]
      </document>
    </resourceList>
    <fields>
      [...]
    </fields>
    <instructions>
      [...]
    </instructions>
  </document>
  <document nodeType="sophora-content-nt:story"
            externalID="story4811">
    [...]
  </document>
</documents>>

Referencing Existing Documents by xPath Expression

In some situations you may need a more flexible mechanism to reference existing documents than just identifying them by external ID. For this purpose you can declare arbitrary xPath expressions, which you must provide with a unique 'idString'. The importer resolves the xPath expressions and replaces every occurrence of the 'idString' with the external ID(s) of the xPath expression result(s).

A xPath expression is declared in the special element <documentIdentificationExpression> which is a direct child of the element <documentIdentificationExpressions>. The latter is the optional first Element of the element <documents>.

The first example shows the update of a document with the sophora id 'broadcastimage142':

<?xml version="1.0" encoding="UTF-8"?>
<documents xmlns="http://www.sophoracms.com/import/4.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <documentIdentificationExpressions>
    <documentIdentificationExpression idString="$externalId1$">element(*, sophora-content-nt:imageobject)[@sophora:id = 'broadcastimage142']</documentIdentificationExpression>
  </documentIdentificationExpressions>
  <document externalID="$externalId1$">
    <properties>
      [...]
    </properties>
    <childNodes>
      [...]
    </childNodes>
    <resourceList />
    <fields>
      [...]
    </fields>
    <instructions>
      [...]
    </instructions>
  </document>
</documents>

If you wish to reference an existing document from the repository as a childnode or as a reference property, provide the 'idString' of the xPath expression element in the corresponding reference property. To include an existing image (with a sophora id of 'broadcastimage142') in a newly created image gallery, the XML should contain the following snippet:

<?xml version="1.0" encoding="UTF-8"?>
<documents xmlns="http://www.sophoracms.com/import/4.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <documentIdentificationExpressions>
    <documentIdentificationExpression idString="$externalId1$">element(*, sophora-content-nt:imageobject)[@sophora:id = 'broadcastimage142']</documentIdentificationExpression>
  </documentIdentificationExpressions>
  <document>
    <properties>
      [...]
    </properties>
    <childNodes>
      <childNode nodeType="sophora-content-nt:imageref" name="sophora-content:image">
        <properties>
          <property name="sophora:reference">
            <value>$externalId1$</value>
          </property>
        </properties>
        <childNodes/>
        <resourceList />
      </childNode>
    </childNodes>
    <resourceList/>
    <fields>
      [...]
    </fields>
    <instructions>
      [...]
    </instructions>
  </document>
</documents>

The last snippet shows a more complex example: Every story document (i.e. the node has the type 'sophora-content-nt:story') with the headline 'Test' (see first element <documentIdentificationExpression>) is updated by setting it's headline to the value 'A real headline'. Additionally a 'teasersInTeaser' childnode is set at every of this story documents which points to the story with the sophora id 'teaserinteaser100' (see second element <documentIdentificationExpression>):

<?xml version="1.0" encoding="UTF-8"?>
<documents xmlns="http://ww.sophoracms.com/import/4.2">
  <documentIdentificationExpressions>
    <documentIdentificationExpression idString="$externalId1$" maxNumberOfResults="unbounded">element(*, sophora-content-nt:story)[@sophora-content:headline = 'Test']</documentIdentificationExpression>
    <documentIdentificationExpression idString="$externalId2$">element(*, sophora-content-nt:story)[@sophora:id = 'teaserinteaser100']</documentIdentificationExpression>
  </documentIdentificationExpressions>
  <document externalID="$externalId1$">
    <properties>
      <property name="sophora-content:headline">
        <value>A real headline</value>
      </property>
    </properties>
    <childNodes>
      <childNode nodeType="sophora-content-nt:teaserRef" name="sophora-content:teasersInTeaser">
        <properties>
          <property name="sophora:reference">
            <value>$externalId2$</value>
          </property>
        </properties>
        <childNodes />
        <resourceList />
      </childNode>
    </childNodes>
    <resourceList />
    <fields>
      [...]
    </fields>
    <instructions>
      [...]
    </instructions>
  </document>
</documents>

The following table explaines the element <documentIdentificationExpression> in details - mandatory content is marked as bold:

ContentDescription
(text content of the element)The text content of the element <documentIdentificationExpression> defines the JCR xPath query expression which is used to make a search against the repository. The returned nodes must be sophora documents.
A very useful tool for testing jcr xPath query expressions against a repository is Toromiro.
idString (attribute)The mandatory attribute 'idString' defines the string which is used as placeholder in the xml document and is replaced with the external ID(s) of the xPath expression result(s).
minNumberOfResults (attribute)The optional attribute 'minNumberOfResults' (default: 1) specifies how many results at least are expected by the xPath expression. If fewer results are found the import will abort with a error message.
maxNumberOfResults (attribute)The optional attribute 'maxNumberOfResults' (default: 1) specifies how many results at most are expected by the xPath expression. If more results are found the import will abort with a error message. If set to the value 'unbounded' no restriction is made.
numberOfResultsToProcess (attribute)The optional attribute 'numberOfResultsToProcess' (default: 'unbounded') specifies how many of the found results are used to replace the 'idString' of the xml document. If this attribute is not set, all results are used.
In some situations you might want to know how many documents are affected by your xPath expression before starting the update process - for this purpose you could set 'numberOfResultsToProcess' to '0'. So you can control in the log file of the importer how many search results your xPath expression returns - without affecting any documents.
createIfNoDocumentFound (attribute)The optional attribute 'createIfNoDocumentFound' (default: 'false') specifies whether a document should be newly created if the xPath expression returns no result.
Notice: It only makes sense to set this attribute to 'true' if the attribute 'minNumberOfResults' has the value '0'.

Last modified on 10/16/20

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.

Icon