Linkchecker Guide

The Linkchecker verifies the availability of the hyperlinks in link documents.

Table of Contents

Configuration

To enable the Linkchecker you have to configure the following property in the sophora.properties file from Sophora server

sophora.linkchecker.enabled=true

Furthermore, the following configuration parameters can be set in the sophora.properties file:

ParameterDescriptionDefault Value
sophora.linkchecker.usernameUser name for Linkchecker job 
sophora.linkchecker.urlChecker.testUrlURL used for checking internet connectivityhttp://www.web.de
sophora.linkchecker.checkerJob.cronExpression.unavailableCron expression to schedule the job to check (currently) all broken links0 0 5 * * ?
sophora.linkchecker.checkerJob.cronExpression.availableCron expression to schedule the job to check (currently) all working links0 0 3 * * ?
sophora.linkchecker.documentProposal.sectionNames.<Proposal section name>The name of the proposal section where proposals for broken links should be added. Sections can be mapped to different structure paths. The names of the structure nodes in the path have to be separated by a period ("."). This mapping includes all sub structure nodes, e.g.
sophora.linkchecker.documentProposal.sectionNames
.default=Broken Links

sophora.linkchecker.documentProposal.sectionNames
.demosite.home=Broken Links (Home)

sophora.linkchecker.documentProposal.sectionNames.
demosite.trendcities=Broken Links (Trendcities)

If a link becomes unavailable it will be added to the closest proposal section. The default proposal section can be set by sophora.linkchecker.documentProposal.
sectionNames.default
.

This parameter can also be set via DeskClient in the configuration document. Changes of this setting in the configuration document will take effect on the next run of the Linkchecker and don't require a restart of the Sophora server.
sophora.linkchecker.documentProposal.expireTimeExpiration time in days for created document proposals0
sophora.linkchecker.urlChecker.setOfflineSet to true to set broken links offlinetrue
sophora.linkchecker.urlChecker.httpStatusCodeWhitelistAll links with a status code greater than or equals 400 are marked as unavailable. Unless they are declared in this white-list. Different codes are comma separated.500,400,503
sophora.linkchecker.urlChecker.connectionTimeoutConnection timeout in milliseconds.20000
sophora.linkchecker.urlChecker.treatTimeoutsAsUnavailableDefines whether links that have exceeded the configured connection timeout, should be treated as unreachable. (true or false)true

Configuration Details

Adding to a Proposal Section

If sophora.linkchecker.documentProposal.sectionNames is set, broken links are added to the proposal section that is mapped to the closest structure path according to the link document's structure node. The name must be unique and point to an existing proposal section (the Linkchecker will not create it). If a broken link is found to be working again, it is removed from the proposal section.

If the parameter sophora.linkchecker.urlChecker.setOffline is set to true, broken links are set offline. This is done in addition to adding them to a proposal section. When a broken link is found to be working again, the proposal is removed, but the document is left offline.

Using an URL for Testing Internet Connectivity

When a broken link is identified, it won't be marked as "unavailable" immediately. Instead, the configured test URL is checked first to verify whether there is an internet connection at all. If the test URL is available, the previously checked link is marked as "unavailable" and its document is set offline. If no connection exists, the Linkchecker skips the current examination but proceeds with the list of links to be checked. As long as no connection is established links are skipped.

Time Scheduling with Cron Configurations

With the parameters sophora.linkchecker.checkerJob.cronExpression.unavailable and sophora.linkchecker.checkerJob.cronExpression.available you can define when and how often the Linkchecker inspects available and unavailable links.

These parameters get a cron expression as value. A cron expression contains six fields for seconds, minutes, hours, days, months and years. This enables you to configure the Linkchecker's activity in terms of time intervals.

An example:
The cron expression "0 0/20 22-05 * * ?" would start the Linkchecker every day between 10p.m. and 5a.m. every 20 minutes.

The cron expression given in the sophora.properties files is passed to the CronTrigger class from the org.quartz package without any further check. Make sure that matches the notation pattern as given in the class's documentation at http://quartznet.sourceforge.net/apidoc/topic285.html.

Example configuration

# Erreichbarkeit des Internets prüfen
sophora.linkchecker.urlChecker.testUrl=http://www.web.de
 
# Anzahl der gecheckten Dokumente pro Durchlauf.
sophora.linkchecker.checkerJob.documentAmount=200
# Wartezeit nach einem Durchlauf in Millisekunden.
sophora.linkchecker.checkerJob.sleepTime=120000
 
#Cron-Ausdruck (Sekunden Minuten Stunden Tage Monate Jahre) für den Check nicht mehr erreichbare Links.
#Mehr Informationen zu dem Ausdruck findet man unter http://quartz.sourceforge.net/javadoc/org/quartz/CronTrigger.html
#Beispiel: 0 0/2 10-18 * * ? läuft jeden Tag zwischen 10 und 18 Uhr jede zwei Minuten.
sophora.linkchecker.checkerJob.cronExpression.unavailable=0 0 0-8 * * ?
#Cron-Ausdruck (Sekunden Minuten Stunden Tage Monate Jahre) für den Check erreichbare Links.
sophora.linkchecker.checkerJob.cronExpression.available=0 0 0-8 * * ?
 
# Gültigkeitzeit der eingeführten Angebote in Tage
sophora.linkchecker.documentProposal.expireTime=0
 
# Set unavailable links offline.
sophora.linkchecker.urlChecker.setOffline=true
 
# Http Status-Code Whitelist. Alle Links mit einem Status Code größer als 400 werden als unerreichbar markiert solange
# die hier nicht deklariert werden. Die Werte werden mit Kommata getrennt.
sophora.linkchecker.urlChecker.httpStatusCodeWhitelist=503
 
#Verbindungszeitüberschreitung in Millisekunden.
sophora.linkchecker.urlChecker.connectionTimeout=10000
 
# Bestimmt ob Links, die zum Antworten die konfigurierte Zeit überschritten haben, als unerreichbar behandelt werden sollen.
sophora.linkchecker.urlChecker.treatTimeoutsAsUnavailable=true
 
# Der Standard-Angebotsbereich zu dem Link-Dokumente hinzugefügt werden, die nicht erreichbar sind
sophora.linkchecker.documentProposal.sectionNames.default=Broken Links
 
# Der Angebotsbereich zu dem Link-Dokumente hinzugefügt werden, die nicht erreichbar sind und unterhalb des Struktur-Pfads /demosite/trendcities verortet sind
sophora.linkchecker.documentProposal.sectionNames.demosite.trendcities=Broken Links (Trendcities)
Note: Documents that are locked by a user will not be checked for broken links. The Linkchecker will try to check these documents again on the next run.

Additional configuration parameters

The following parameters for the Linkchecker are set within the configuration part in the administrator section of the DeskClient.

If the node type of the link document or the URL property description differ from the default values ("sophora-extension-nt:link" and "sophora-extension:url"), this needs to be set in the configuration using the following entries:

KeyDescriptionExample
textlinkNodeTypeDefinition of a link document's node type.sophora-extension-nt:link
textlinkUrlPropertyProperty that contains the actual URL of the textlinkNodeType.sophora-extension:url

To enable a convenient search for broken link documents within the DeskClient, add the following entry:

KeyValue
search.modifiersBroken Links;§[@jcr:primaryType='' and @sophora-extension:available='false']

If the property search.modifiers already exists, append the value to its list.

Triggering the Linkchecker jobs via JMX

There is a CronManager JMX interface com.subshell.sophora.cron/CronManager where you can find two operations to trigger the Linkchecker: triggerUnavailableLinkchecker and triggerAvailableLinkchecker. These enable additional triggering and will not affect the regular schedule defined by the cron expressions.