Importer 5

Importer: Import Via Webservice (SOAP)

The import process: How to import documents with the importer webservice.

Import via webservice

The main difference between an import via webservice and via watchfolder is the way of passing the files to the importer. After the files to import are handed over to the Importer the process is mostly identical. The webservice offers the same functionality as the watchfolder mechanism. In addition to the watchfolder mechanism, the webservice returns an XML description of the import status instead of just writing message to the log. It is also possible to send parameters along with the file to import, which can be accessed by preprocessor scripts or XSL transformations.

First of all the webservice has to be activated and configured in the base configuration file application.yml. It is not recommended to switch the webservice on if you dont need it.

If the Importer operates behind a proxy, the configuration option httpProxyHost and related options should be set in the configuration file. This is necessecary if the Import XML references remote files using URLs. It is also possible to refer to files in the local file system from Sophora XML given to the webservice. Access is restricted to a folder specified by the folders.fileAccessBase instance configuration property.

The webservice can be disabled for specific instances in the instance configuration. In this case, there will be an error if someone tries to import documents via webservice using the corresponding instance.

To secure the webservice you can enable basic authentication using the configuration option importer.webService.authenticationRequired. The usernames and passwords must be configuration using the option importer.webService.logins.

Once the webservice is configured and running you should be able to access the WSDL description at the URL /importService.wsdl using your browser. This description defines the following methods that can be invoked remotely to import documents:

  1. importXml
  2. importXmlWithBinaries
  3. importXmlByReference
  4. importXmlByReferenceWithBinaries
  5. importXmlToInstanceWithKey
  6. importXmlWithBinariesToInstanceWithKey
  7. importXmlByReferenceToInstanceWithKey
  8. importXmlByReferenceWithBinariesToInstanceWithKey

Binary files which are referenced in the Sophora-XML can be imported as base64 encoded data. If no binary files have to be imported, use the methods importXml, importXmlByReference or the corresponding *InstanceWithKey variants.

The parameter xslParams is optional in all webservice methods. With this parameter you can define one or more key value pairs, which are are passed to the preprocessor script and XSL transformation.

Every webservice method exists in two versions. The first one uses the default instance specified by the option importer.webService.defaultInstance. For the *ToInstanceWithKey methods, the instance is selected using the instanceKey parameter, which refers to the instance key from the instance configuration. The *ToInstance methods without WithKey are deprecated and will be removed in a future major release.

Importing by reference

An URI referencing the file to be imported can be passed using the *byReference methods. It is possible to refer to remote and local files using 'http:', 'https:', or 'file:' as valid protocols.

When using the protocol 'file:', it is only possible to access files which are located recursively under the base directory for file access. This directory is configured for each importer instance using the configuration option folders.fileAccessBase. For security reasons it is not allowed to reference files in a higher folder hierarchy than this base directory.

The *byReference methods can optionally specify a character encoding for the referenced file. If no encoding is specified, UTF-8 is assumed.

Example URIs:

  • Referencing a local file via a relative path: file:importdata/sophoradocument100_document.xml
  • Referencing a remote file: http://www.example.com/importxml/sophoradocument100_document.xml

See section Examples for referencing binary files via the webservice for more examples.

Example

The following listing shows a possible SOAP message for importing an image document with binary data and additionally two XSL parameters.

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ws="http://ws.importer.sophora.subshell.com/">
  <soapenv:Header />
  <soapenv:Body>
    <ws:importXmlWithBinaries>
      <!-- Xml to be imported (wrapped in a cdata block). -->
      <documentXml><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
        <document xmlns="http://www.sophoracms.com/import/2.0" nodeType="sophora-core:story" externalID="story_00004711">
          [...]
        </document>]]>
      </documentXml>
      <binaryFile>
        <!-- The filename of a binary file (as it appears in the binary property in the sophora xml). -->
        <filename>trendcityparis100_binary_1.jpg</filename>
        <!-- Base64 encoded binary data of this file. -->
        <binaryData>/9j/4...8n//2Q==</binaryData>
      </binaryFile>
      <!-- Zero or more XSL parameters possible. -->
      <xslParam>
        <!-- The key of the first parameter. -->
        <key>structureNode</key>
        <!-- The value of the first parameter. -->
        <value>/demosite/sports/handball</value>
      </xslParam>
      <xslParam>
        <!-- The key of the second parameter. -->
        <key>idStem</key>
        <!-- The value of the second parameter. -->
        <value>handball</value>
      </xslParam>
    </ws:importXmlWithBinaries>
  </soapenv:Body>
</soapenv:Envelope>

Webservice Response

The webservice will return a UTF-8 encoded xml description of the import's status. A webservice response might look like the following example:

<?xml version="1.0" encoding="UTF-8"?>
<importInformation xmlns="http://www.sophoracms.com/importinformation"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://www.sophoracms.com/importinformation http://www.sophoracms.com/importinformation/sophora-importinformation-1.2.0.xsd"
successful="true" importDate="2010-07-29T15:11:54.568+02:00" duration="0.742">
  <originalFileName>ws_1280409113825.xml</originalFileName>
  <importFile>/cms/import/broadcasts/successful/ws_1280409113825_2010-07-29_15-11-54-566.xml</importFile>
  <cleanedImportFile />
  <transformedImportFile /> <modifiedSophoraDocumentFile />  <errorText />
  <processedBinaryFiles>
    <file>/cms/import/broadcasts/successful/filmseinemutterundich102_binary_1.jpg</file>
  </processedBinaryFiles>
  <documents>
    <documentInformation newlyCreated="false" successfullySaved="true">
      <sophoraId>filmseinemutterundich102</sophoraId>
      <externalId>broadcast-5567523</externalId>
      <uuid>e8031441-240e-420b-bb5c-b5f817be6a51</uuid>
      <resourceListDocuments>
        <documentInformation newlyCreated="false" successfullySaved="true">
          <sophoraId>seinemutterundich102</sophoraId>
          <externalId>image-4324543</externalId>
          <uuid>f01ac61d-ad5a-4cbe-b3fc-41d4005434be</uuid>
          <resourceListDocuments />
        </documentInformation>
      </resourceListDocuments>
    </documentInformation>
  </documents>
</importInformation>

The direct children and the attributes of the element <importInformation> describe the overall import process: Was it successful (attribute "successful")? When did it happen (attribute "importDate")? How long did it take (attribute "duration")? Which files were transformed and created (elements "originalFileName", "importFile", "cleanedImportFile" and "transformedImportFile")? Which warnings and errors did happen ("errorText")? Which binary files where processed (element "processedBinaryFiles")?

In the above example a sophora xml with two <document> elements was imported: Inside a broadcast document was wrapped an image document (in the element <resourceList> of the broadcast document). Therefore in the above response xml there are two elements <documentInformation> with the outer element describing the broadcast import and the inner element describing the image import. For every handled document you achieve the following information: Was the document new created or was it an update of an existing document (attribute "newlyCreated")? Was the document successfully saved (attribute "successfullySaved")? How are the different ids of the processed document (elements "sophoraId", "externalId" and "uuid")?

Information about the import process (element <importInformation>):

ElementDescription
successful (attribute)Indicates if the import process was altogether successful. This means that any content that should be imported has been imported.
Note: In any case there may have been warnings occurred during the import process which are collected in the element <errorText>.
importDate (attribute)Shows the time of the import in the time format "ISO 8601" - "2010-07-29T15:11:54.568+02:00" for instance.
duration (attribute)Shows the duration (in seconds) of the import process - "0.242" for instance.
importDeferred
(attribute)
If the import XML contained the forceLock-Element with a timeout, the importer might defer the import if the respective document is locked. In this case, the SOAP-call returns before the import finishes and contains this attribute set to true. See the XML reference (section 'Asking the user to release a document lock') for more Information.
originalFileName (element)The original name of the xml file which was imported.
Note: This element is empty if a low level error has occurred.
importFile (element)The complete path of the moved and renamed xml file. This file is located either in the successful or in the failure folder of the importer instance.
Note: This element is empty if a low level error has occurred.
cleanedImportFile (element)The complete path of the cleaned xml file (if 'sophora.importer.transformation.repairXml' is configured). This file is located either in the successful or in the failure folder of the importer instance.
transformedImportFile (element)The complete path of the transformed xml file (if a xsl transformation was made). This file is located either in the successful or in the failure folder of the importer instance.
modifiedSophoraDocumentFile (element)The complete path of the modified sophora xml file (if xPath identifier expressions were used in the sophora xml). This file is located either in the successful or in the failure folder of the importer instance.
errorText (element)A collection of the errors and warning which occurred during the import process of the xml document.
processedBinaryFiles (element)The binary files which were handled during the import process of the xml document. Every file is wrapped in a <file> element.
documents (element)Information about the sophora documents which were handled during the import process of the xml document (see next table).

Information about a particular document import (element <documentInformation>):

ElementDescription
newlyCreated (attribute)"newlyCreated" shows if the sophora document was newly created (value "true") or if an existing document was updated (value "false").
Note: If the import of the document was not successful - i.e. the attribute "successfullySaved" is "false" - this attribute has no informative value.
successfullySaved (attribute)The attribute "successfullySaved" indicates whether the document was successfully saved in the repository.
sophoraId (element)The sophora id of the processed <document> element.
Note: If the import of the <document> element was not successful (i.e. the attribute "successfullySaved" is "false"), this element may be empty or only set with the provided id stem of the document to be imported.
externalId (element)The external id of the processed <document> element.
Note: If the import of the <document> element was not successful (i.e. the attribute "successfullySaved" is "false"), this element may be empty.
uuid (element)The uuid of the processed <document> element.
Note: If the import of the <document> element was not successful (i.e. the attribute "successfullySaved" is "false"), this element may be empty.
resourceListDocuments (element)Information about the <document> elements which are placed in the element <resourceList> of the current <document> element (in the import xml!).

Triggering the import

As mentioned before there are hardly any differences between importing via webservice and via watchfolder. After the webservice retrieved the import data and optionally the binary data, it creates one input file and the binary files in the temporary folder of the importer instance. These files are then passed on to the Importer and processed as usual. Whether these files where put into a watchfolder before or transferred via webservice doesn't make any difference from this point, with the exception that imports by webservice can pass parameters to the transformation.

Examples for Referencing Binaries via the Webservice

There are multiple possibilities to reference binary data when importing documents via the webservice of the Sophora Importer. This page is an addition to the section about the webservice of the Importer located in the Sophora Importer documentation. This documentation requires knowledge about the basics of webservices and the Sophora Importer. Please refer to the Sophora Importer documentation especially to the part about the webservice before continue reading.

Referencing binary files via the webservice

In general there are two different ways to reference binary files when importing via the webservice interface.

  1. Binary files may be referenced within the Sophora XML.
  2. Binary files may be referenced within the SOAP body as a binary file list, specified by a parameter.

These two possibilities are described in the following sections.

Referencing binary files within the Sophora XML

By using one of the following methods, binary files are referenced within the Sophora XML:

  • importXml
  • importXmlToInstanceWithKey
  • importXmlByReference
  • importXmlByeferenceToInstanceWithKey

Generally binary files may be referenced in all ways described in the documentation about binary properties in the Sophora Importer documentation. One slight difference is, that when referencing local files, the relative paths must be relative to the temporary directory of the used importer instance.

In addition to referencing local files, it is also possible to reference binary data via the http:// or the file:// protocol. When using the file:// protocol, the referenced file must be available on the remote host, on which the the webservice is running. The file must be located in the fileAccessBase directory or any subdirectory.

Examples

In the following you can find different examples of referencing binary files. The first examples also include excerpts of the basic xml skeleton. The other examples are reduced to the binary property.

Referencing a binary file via its relative path

Please note that the path is relative to the temporary directory of the corresponding importer instance.

The complete example is available here.

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:ws="http://ws.importer.sophora.subshell.com/">
    <soapenv:Header/>
    <soapenv:Body>
        <ws:importXml>
            <documentXml>
                <![CDATA[
<documents xmlns="http://www.sophoracms.com/import/2.4">
 <document nodeType="sophora-extension-nt:image" externalID="d506ac91-3cdd-41df-b70d-82be7edc6b0d">
 <properties>
 ...
 </properties>
 <childNodes>
 <childNode nodeType="sophora-extension-nt:imagedata" name="sophora-extension:imagedata">
 <properties>
 <property name="sophora-extension:binarydata" mimetype="image/jpeg">
 <value>../images/image.jpeg</value>
 </property>
 ...
 </properties>
 <childNodes />
 <resourceList />
 </childNode>
 ...
 </childNodes>
 <resourceList />
 <fields>
 ...
 </fields>
 <instructions>
 ...
 </instructions>
 </document>
</documents>
...
 ]]>
            </documentXml>
        </ws:importXml>
    </soapenv:Body>
</soapenv:Envelope>

Referencing a binary file via its absolute path

The complete example is available here.

<property name="sophora-extension:binarydata" mimetype="image/jpeg">
    <value>file:///cms/project/data/images/image.jpeg</value>
</property>

Including the binary values base64 encoded

The complete example is available here.

<property name="sophora-extension:binarydata" mimetype="image/jpeg">
    <value>data:;base64,/9j/4AAQSkZJRgABAQEASABIAAD/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgk...</value>
</property>

Referencing a binary file via the http protocol

The complete example is available here.

<property name="sophora-extension:binarydata" mimetype="image/jpeg">
    <value>http://www.example.com/image.jpg</value>
</property>

It is also possible to use the secure protocol https rather than http.

Referencing binary files as a binary file list

By using one of the following methods binary files are referenced as a list of binary files:

  • importXmlWithBinaries
  • importXmlWithBinariesToInstanceWithKey
  • importXmlByReferenceWithBinaries
  • importXmlByReferenceWithBinariesToInstanceWithKey

These methods define additional parameters to include base64 encoded binary values. These parameters are defined within the SOAP body next to the sophora xml. They are are referenced by their name (image.jpg in the following example).

The complete example is available here.

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
                  xmlns:ws="http://ws.importer.sophora.subshell.com/">
    <soapenv:Header/>
    <soapenv:Body>
        <ws:importXmlWithBinaries>
            <documentXml>
                <![CDATA[
<documents xmlns="http://www.sophoracms.com/import/2.4">
 <document nodeType="sophora-extension-nt:image" externalID="d506ac91-3cdd-41df-b70d-82be7edc6b0d">
 <properties>
 ...
 </properties>
 <childNodes>
 <childNode nodeType="sophora-extension-nt:imagedata" name="sophora-extension:imagedata">
 <properties>
 <property name="sophora-extension:binarydata" mimetype="image/jpeg">
 <value>image.jpeg</value>
 </property>
 ...
 </properties>
 <childNodes />
 <resourceList />
 </childNode>
 ...
 </childNodes>
 <resourceList />
 <fields>
 ...
 </fields>
 <instructions>
 ...
 </instructions>
 </document>
</documents>
...
 ]]>
            </documentXml>
            <binaryFile>
                <binaryData>iVBORw0KGgoAAAANSUhEUgAAAyAAAA ...</binaryData>
                <filename>image.jpeg</filename>
            </binaryFile>
        </ws:importXmlWithBinaries>
    </soapenv:Body>
</soapenv:Envelope>

Please note that the SOAP header's element binaryData expects base 64 ecoded data. It is not possible to reference binary data via a URL or a path using this webservice parameter. However it is still possible to reference binary data additionally within the sophora xml like mentioned in section "Referencing binary files within the Sophora XML".

Details for using a Java-Client

In case you want to use a Java client (e.g. based on java-ws or Apache Axis 2) to connect to the webservice interface, the web service description (WSDL) is available at the URL /importService.wsdl of your importer installation. Two typical interface methods look like these:

String importXml(String documentXml);
String importXmlWithBinaries(String documentXml, List<BinaryFileBean> binaryFile)

When using the first method the binary values are referenced in the Sophora XML as described above.

When using the second method (or any other interface method with the suffix "WithBinaries") the binary values are referenced as filenames in the Sophora XML and passed as a list of BinaryFileBean objects. Make sure that you specify as many BinaryFileBeans in the list of binaryFiles as you have defined in the Sophora XML.

Consider the following example with two filenames (image1.jpeg and image2.jpeg) in the Sophora XML.

String sophoraXml = ...

BinaryFileBean image = new BinaryFileBean();
image.setFilename("image1.jpg");
image.setBinaryData(binaryData);

BinaryFileBean image2 = new BinaryFileBean();
image.setFilename("image2.jpg");
image.setBinaryData(binaryData);

List<BinaryFileBean> binaryFile = new ArrayList<BinaryFileBean>();
binaryFile.add(image);
binaryFile.add(image2);

importerServer.importXmlWithBinaries(sophoraXml, binaryFile);

Last modified on 10/16/20

The content of this page is licensed under the CC BY 4.0 License. Code samples are licensed under the MIT License.

Icon