Linkchecker Guide

The Linkchecker verifies the availability of the hyperlinks in link documents.

Table of Contents

Enable the Linkchecker

To enable the Linkchecker you have to configure the following property in the sophora.properties file from Sophora server

sophora.linkchecker.enabled=true

Configuring the Linkchecker

To change the configuration, edit the sophora.properties file which is located in the config subdirectory of the Sophora server installation.

Configuration Parameters of the sophora.properties file

ParameterDescriptionDefault Value
sophora.linkchecker.usernameUser name for linkchecker job 
sophora.linkchecker.urlChecker.testUrlURL used for checking internet connectivityhttp://www.web.de
sophora.linkchecker.checkerJob.cronExpression.unavailableCron expression to schedule the job to check (currently) all broken links0 0 5 * * ?
sophora.linkchecker.checkerJob.cronExpression.availableCron expression to schedule the job to check (currently) all working links0 0 3 * * ?
sophora.linkchecker.documentProposal.sectionNames.The Name of the proposal sections where broken links will be added when not available. Sections can be mapped to different structure paths. The names of the structurenodes in the path have to be separated by '.'. This mapping includes all sub structurenodes, e.g.
sophora.linkchecker.documentProposal.sectionNames
.default=broken-links

sophora.linkchecker.documentProposal.sectionNames
.demosite.home=broken-links-home

sophora.linkchecker.documentProposal.sectionNames.
demosite.trendcities=broken-links-trendcities

If a link becomes unavailable it will be added to the closest proposal section. The default proposal section can be set by sophora.linkchecker.documentProposal.
sectionNames.default.
sophora.linkchecker.documentProposal.expireTimeExpiration time in days for created document proposals0
sophora.linkchecker.urlChecker.setOfflineSet to true to set broken links offlinetrue
sophora.linkchecker.urlChecker.httpStatusCodeWhitelistAll links with a status code greater than or equals 400 are marked as unavailable. Unless they are declared in this white-list. Different codes are comma separated.500,400,503
sophora.linkchecker.urlChecker.connectionTimeoutConnection timeout in milliseconds.20000
sophora.linkchecker.urlChecker.treatTimeoutsAsUnavailableDefines whether links that have exceeded the configured connection timeout, should be treated as unreachable. (true or false)true

Configuration Details

Adding to a Proposal Section

If sophora.linkchecker.documentProposal.sectionName is set, broken links are added to the proposal section that is mapped to the closest structure path according to the link document's structure node.. The name must be unique and point to an existing proposal section (the Linkchecker will not create it). If a broken link is found to be working again, it is removed from the proposal section.

If the parameter sophora.linkchecker.urlChecker.setOffline is set to true, broken links are set offline. This is done in addition to adding them to a proposal section. When a broken link is found to be working again, the proposal is removed, but the document is left offline.

Using an URL for Testing Internet Connectivity

When a broken link is identified, it won't be marked as "unavailable" immediately. Instead, the configured test URL is checked first to verify whether there is an internet connection at all. If the test URL is available, the previously checked link is marked as "unavailable" and its document is set offline. If no connection exists, the Linkchecker skips the current examination but proceeds with the list of links to be checked. As long as no connection is established links are skipped.

Time Scheduling with Cron Configurations

With the properties sophora.linkchecker.checkerJob.cronExpression.unavailable and sophora.linkchecker.checkerJob.cronExpression.available you can define when and how often the Linkchecker inspects available and unavailable links.

These variable get a cron expression as value. A cron expression contains six fields for seconds, minutes, hours, days, months and years. This enables you to configure the the Linkchecker's activity in terms of time intervals.

An example:
The cron expression "0 0/20 22-05 * * ?" would start the Linkchecker every day between 10p.m. and 5a.m. every 20 minutes.

The cron expression given in the sophora.properties files is passed to the CronTrigger class from the org.quartz package without any further check. Make sure that matches the notation pattern as given in the class's documentation at http://quartznet.sourceforge.net/apidoc/topic285.html.

Example configuration

# Erreichbarkeit des Internets prüfen
sophora.linkchecker.urlChecker.testUrl=http://www.web.de
 
# Anzahl der gecheckten Dokumente pro Durchlauf.
sophora.linkchecker.checkerJob.documentAmount=200
# Wartezeit nach einem Durchlauf in Millisekunden.
sophora.linkchecker.checkerJob.sleepTime=120000
 
#Cron-Ausdruck (Sekunden Minuten Stunden Tage Monate Jahre) für den Check nicht mehr erreichbare Links.
#Mehr Informationen zu dem Ausdruck findet man unter http://quartz.sourceforge.net/javadoc/org/quartz/CronTrigger.html
#Beispiel: 0 0/2 10-18 * * ? läuft jeden Tag zwischen 10 und 18 Uhr jede zwei Minuten.
sophora.linkchecker.checkerJob.cronExpression.unavailable=0 0 0-8 * * ?
#Cron-Ausdruck (Sekunden Minuten Stunden Tage Monate Jahre) für den Check erreichbare Links.
sophora.linkchecker.checkerJob.cronExpression.available=0 0 0-8 * * ?
 
# Gültigkeitzeit der eingeführten Angebote in Tage
sophora.linkchecker.documentProposal.expireTime=0
 
# Set unavailable links offline.
sophora.linkchecker.urlChecker.setOffline=true
 
# Http Status-Code Whitelist. Alle Links mit einem Status Code größer als 400 werden als unerreichbar markiert solange
# die hier nicht deklariert werden. Die Werte werden mit Kommata getrennt.
sophora.linkchecker.urlChecker.httpStatusCodeWhitelist=503
 
#Verbindungszeitüberschreitung in Millisekunden.
sophora.linkchecker.urlChecker.connectionTimeout=10000
 
# Bestimmt ob Links, die zum Antworten die konfigurierte Zeit überschritten haben, als unerreichbar behandelt werden sollen.
sophora.linkchecker.urlChecker.treatTimeoutsAsUnavailable=true
 
# Der Standard-Angebotsbereich zu dem Link-Dokumente hinzugefügt werden, die nicht erreichbar sind
sophora.linkchecker.documentProposal.sectionNames.default=broken-links
 
# Der Angebotsbereich zu dem Link-Dokumente hinzugefügt werden, die nicht erreichbar sind und unterhalb des Struktur-Pfads /demosite/trendcities verortet sind
sophora.linkchecker.documentProposal.sectionNames.demosite.trendcities=broken-trendcities-links

Triggering the Linkchecker-Jobs via JMX

There is a CronManager-JMX-Interface com.subshell.sophora.cron/CronManager where you can find two operations for trigger the linkchecker: triggerUnavailableLinkchecker and triggerAvailableLinkchecker

Configuration parameters from within the DeskClient

Some parameters for the link checker are set within the configuration part in the administrator section.

If the node type of the link document or the URL property description differ from the default values ("sophora-extension-nt:link" and "sophora-extension:url"), this needs to be set in the configuration using the following entries:

KeyDescriptionExample
textlinkNodeTypeDefinition of a link document's node type.sophora-extension-nt:link
textlinkUrlPropertyProperty that contains the actual URL of the textlinkNodeType.sophora-extension:url

To enable a convenient search for broken link documents within the deskclient, add the following entry:

KeyValue
search.modifiersBroken Links;§[@jcr:primaryType='' and @sophora-extension:available='false']

If the property search.modifiers already exists, append the value to its list.