wwwchecklinks

In this document:
  • Name
  • Synopsis
  • Description
  • Options
  • Examples
  • Windows and Output
  • Version and Limitations
  • Availability
  • Author
  • Name

    wwwchecklinks - check web pages for broken links

    Synopsis

    wwwchecklinks [ -imagelinks yes|no ] [ -checkalllinks yes|no ] rooturl1 ... rooturln [ -prune url1 ... urln ]

    Description

    wwwchecklinks is a program that looks for broken links in web page hierarchies. The root of the hierarchy to be checked is determined by one or more URLs given on the command line. The result is displayed in an X window which allows you to browse the result (even while the search is in progress). The result can also be saved to two files: one summary file (called CheckLinks.Summary) and one complete cross reference listing for the checked documents (called CheckLinks.Report).

    Options

    [ -imagelinks yes|no ]
    yes means that links to inlined images are checked. This is the default.

    [ -checkalllinks yes|no ]
    yes means that all links are checked. no means that only links to documents on the same server as one of the root documents are checked. The default is no.

    [ -prune url1 ... url1 ]
    Normally all reachable documents below the root documents are checked. Using this option you can prune selected subhierarchies.

    Examples

    Some example usages of wwwchecklinks:
    wwwchecklinks http://www.cs.chalmers.se/~hallgren/
    The program will check that all links from my home page to other documents on the same server work. It will also follow links that lead to my other documents (i.e., documents with URLs that start with http://www.cs.chalmers.se/~hallgren/) and check them too.

    wwwchecklinks -checkalllinks yes http://www.cs.chalmers.se/~hallgren/
    As in the previous example, but program will check all links, not just links to documents on the same server (e.g., www.cs.chalmers.se). This will probably take some time, since my bookmarks.html file contains over 400 links to various servers around the world.

    wwwchecklinks http://www.cs.chalmers.se/~hallgren/ -prune http://www.cs.chalmers.se/~hallgren/naptv
    In my www directory, I have two subdirectories, naptvb94 and naptvb95, with course related information. If I only want to check my personal pages I prune those away. The program still checks the links from my home page to documents in naptvb94 and naptvb95, but it doesn't descend into the naptv directories and check the documents there.

    Windows and Output

    When you start the program it starts looking for broken links and opens a window which looks something like this:

    [Window dump of wwwchecklinks while running]

    The top part of the window shows a summary of the result, which is updated only when you press the Update button. You can press Update at any time to see how the search is progressing. You can also press the Save button at any time to save the information collected so far. (The files will be called CheckLinks.Summary and CheckLinks.Report.)

    The bottom part of the window consists of three boxes showing the progress of the search. From top to bottom they show: which document is being checked at the moment, server connection status, which link is being checked at the moment.

    When the search is complete (and you have pressed the Update button) the window will look something like this:

    [Window dump of wwwchecklinks after pressing the Update button]

    The summary window shows one line for each URL encountered during the search. The lines have the following general format:

    reference_count -> information URL
    where reference_count is the number of references to this URL, information is some brief information about the URL or the document it refers to and URL is the URL in question.

    The URLs encountered during the search are displayed in the following order:

    1. Broken Links. In this case, the information field indicates what kind of error occurred when trying to fetch the document. Common errors are:

      The broken links are ordered by the error number and the number of references to them.

    2. Unchecked Links. The information field simply says Not checked.

    3. Working links to Checked Documents. The information field indicates the MIME type (e.g. text/html) of the document and the number of unchecked, broken and working links in the document. The documents are ordered by the number of broken links.

    4. Working links to Unchecked Documents. The information field contains the MIME type of the document and ? ? ? (indicating that the number of working/broken links is not known).
    The list shown in the summary window is saved in the CheckLinks.Summary when you press the Save button.

    Clicking on a line in the summary window opens a window containing more detailed information on that link/document. For example, clicking on the line

    3 -> text/html 7 1 24 http://www.cs.chalmers.se/~hallgren/
    (which by the way says that there are three references to my home page among the documents checked and that my home page contains 7 unchecked links, one broken link and 24 working links) in the above window produces the following information:
    Document http://www.cs.chalmers.se/~hallgren/
    Type: text/html
    
    References to this document from:
      http://www.cs.chalmers.se/~hallgren/lic-abstract.html
      http://www.cs.chalmers.se/~hallgren/videoband.html
      http://www.cs.chalmers.se/~hallgren/klockan.cgi
    
    BAD links
      http://www.cs.chalmers.se/Fudgets/
    
    Unchecked links
      http://lips.cs.chalmers.se:8888/trams
      gopher://sunic.sunet.se:43/0thomas-h.pp.se
      gopher://cs.chalmers.se:79/0/w hallgren
      http://slip-02.cs.chalmers.se/
      ftp://ftp.cs.chalmers.se/pub/users/hallgren
      http://www.chalmers.se/
    
    Good links
      http://www.cs.chalmers.se/~hallgren/count.cgi
      http://www.cs.chalmers.se/~hallgren/klockan.cgi
      http://www.cs.chalmers.se/~hallgren/wget.cgi
      http://www.cs.chalmers.se/~hallgren/ibtelpre.html
    
    (+ the remaining 20 good links)
    This information (for all documents) is saved in CheckLinks.Report when you press the Save button.

    Version and Limitations

    This is version 1.0. Please notice the following limitations:

    Availability

    The program is installed on our local Sun4 computers in {cs,math,md,mdstud}.chalmers.se.

    Author

    Send any questions or comments to the author: Thomas Hallgren.
    Thomas Hallgren