How to Automatically Insert Hyperlinks in documents for publication to Intranets

Article contributed by John McGhie

This article explains how to automatically insert hyperlinks in documents for publication to Intranets, using Word 2000 and above.

There are two methods you can use, depending on the nature of your document:
  

 

The AutoText process.

 

The Concordance process.

Use the process suitable to your source material.
  

 

Neither method suits documents that will contain less than a few hundred hyperlinks: the overhead in setting them up is excessive.

 

The Concordance method requires that you know how to edit macros. The editing is very simple: read these instructions to decide if you can do this.

 

The Concordance method is better suited to reference manuals, programming manuals and the like, which contain sections or appendices of fields or parameters or commands that are frequently referred to throughout the text.

 

The Concordance method suits material where most hyperlinks occur more than 20 times.

 

The Concordance method is better suited to situations where the hyperlink destinations will vary from issue to issue, since you can readily change the hyperlink path for all destinations in a single place.

 

Corporate documentation, manuals, and lengthy reports are better suited to the AutoText method.

 

The AutoText method is the only one that will work if the document has already been divided into web pages. You can use it by employing Word to edit the HTML files directly.

Preparation

Read all the instructions before you begin: everything depends on everything else! Both methods work better if the source is a single file.
  

1.

You need to know the destination URLs and hyperlink path for the eventual website. I strongly suggest that you name each HTML page after the heading that begins it: Word will do this for you automatically.

2.

You need to know the directory structure of the destination web. This example is written around a real-life example: a book named Open Interface Specification is published to a sub-web named OI_Spec/Issue_9 which contains two sub-directories: Field_Pages and Text.

OI_Spec
   Issue_9
          Field_Pages
          Text

I haven’t shown the web root. The manual divides into three pools of pages. The main page and all the various TOC files for the navigation structure are in the sub-web root. The narrative pages are in the Text subdirectory, and the Field pages are in the Field_Pages folder.

In Field_Pages there is one page for each of the fields. Each of the Text pages contains a hyperlink to one or more of the fields. In total the manual contains around 4,500 hyperlinks and about 800 text pages.

So we can deduce that the hyperlink path for each of the field pages is http://Documentation/OI_Spec/Issue_9/Field_Pages. And for the text it’s http://Documentation/OI_Spec/Issue_9/Text.
  

3.

You must know exactly what each page name is, which means you should have already published the manual to the website.

If you want to use the Concordance method, use an automated method of publication so that the page names are machine-generated, because you will need to re-publish the entire document when the hyperlinks are in place. If the page names change you get 4,500 broken links {grin}.

Publish now and come back here when you have finished.
  

4.

It’s important to fix your fields so their values do not change during the write to HTML, and particularly, the values of any numbering that may be in use. The easiest way to do that is to save the document as a web page at this point, then use the Export to Compact HTML function.
  

5.

In most cases you would now save the document in its final resting place and set the Hyperlink Base property in File>Properties>Summary. This causes any hyperlinks Word writes after this to be inserted as relative paths, which makes moving and maintaining the document easier afterwards.
  

AutoText Method

The AutoText method is the simplest to understand. This makes it the method of choice for shortish documents.

You scan through the document (by eye) looking for things you want to hyperlink. When you find one:
  

1.

Insert a hyperlink in the ordinary way. Use Ctrl + K and either browse to the page or select it from the list. It’s a very good idea to select only the name of the object. Instead of See manual insertion later in this document you would highlight only manual insertion.

2.

Select the entire hyperlink you just inserted.

3.

Press Alt + F3 to open the Add AutoText box.

4.

Leave the name exactly as it appears on the text and click OK.

Now you continue scanning through the document:
  

 

Each time you come to an instance of a hyperlink item you have already inserted, press F3 to insert the hyperlink again.

 

Each time you come to an item you have not inserted before, perform steps 1 to 4 to add it to your AutoText collection.

 

 If you’re not sure, press F3 anyway. If you have not already added this hyperlink, nothing will happen.

This method relies on the fact that in Word 97 and above, Word stores AutoText entries with complete formatting and all properties. When you hit F3, Word replaces the document text with the text of the hyperlink, complete with the hyperlink field properties.

Concordance Method

This is a complex method suitable for applying a very large number of hyperlinks to a large document.

For this method to work properly, the terms to be hyperlinked must be either unique within the document or very rare. The method does not produce a useful result if you include terms that occur throughout the document but which should not be hyperlinked.

How it Works
This method uses a concordance file to add an index tag to every instance of a term you want to hyperlink.

It then uses a macro to retrieve the text from each index tag, highlight the text before it, add a hyperlink tag with the same text, then delete the index tag.

For the experts, it may be worth noting that:
  

 

You cannot automatically tag a Word document with any sort of field tag other than an XE tag.

 

An XE field is one of the Cold field types: in VBA terms this means it does not have a result which means two things:

 

 You cannot directly access the result of the field, you have to grab the text from it.

 

You cannot change the field type after it’s inserted, you have to laboriously store the field text then write a new field of type hyperlink with the text as the URL.

Preparation
To use the concordance method, the document must be a single file, and the destination file names must be known in advance.
  

1.

Use the Master Document method to publish the document to a website. Ensure that you name the pages after the strings you expect to find in the text. If you use the master document method, word will do this automatically.

2.

Install the macros below to your Normal template.

3.

Construct a concordance file of the terms to be tagged and the hyperlink destinations.

Macros
Install the following macros to your Normal Template.

Sub MakeConcordance()'

Const hBase As String = "../Text/"
Const htm As String = ".htm"
Dim aCell As Cell
Dim aString As String

For Each aCell In ActiveDocument.Tables(1).Columns(2).Cells
    aString = hBase & Trim(Left$(aCell.Range.Text, (Len(aCell.Range.Text) - 2))) & htm
    aCell.Range.Text = aString
Next aCell

End Sub


Sub MakeHyperlinks()
Dim afield As Field
Dim url As String
Dim isHyper As Integer

For Each afield In ActiveDocument.Fields
    If afield.Type = wdFieldIndexEntry Then
        isHyper = 0
        afield.Select
        Selection.Collapse
        url = Right$(afield.Code, Len(afield.Code) - 5)
        url = Left$(url, Len(url) - 2)

        If Left$(url, 4) = "../F" Then
            isHyper = 1
        End If

        If Left$(url, 4) = "../T" Then
            isHyper = 2
        End If

        If isHyper <> 0 Then
             Selection.MoveStart unit:=wdCharacter, Count:=-3
             Selection.MoveStart unit:=wdWord, Count:=-isHyper
             afield.Delete
             ActiveDocument.Hyperlinks.Add Anchor:=Selection.Range, _
               Address:=url
        End If
    End If

Next afield

End Sub


Concordance File
A concordance file is simply a document that contains nothing other than a two-column table.

In the first column, you place each term to be searched for. You must have only one entry per cell.

In the same row in the second column, you place the text of the index entry you want created.

The macros above are designed to operate with specific character strings in the concordance file. Here is a section of the concordance file they are designed to work with:

transactionOriginID

../Field_Pages/transactionOriginID.htm

TSNumber

../Field_Pages/TSNumber.htm

undisclosedQuantity

../Field_Pages/undisclosedQuantity.htm

userID

../Field_Pages/userID.htm

userName

../Field_Pages/userName.htm

userType

../Field_Pages/userType.htm

yield

../Field_Pages/yield.htm

100 enterOrder

../Text/100 enterOrder.htm

101 amendOrder

../Text/101 amendOrder.htm

102 tickOrder

../Text/102 tickOrder.htm

103 cancelOrder

../Text/103 cancelOrder.htm

104 enterTrade

../Text/104 enterTrade.htm

105 cancelTrade

../Text/105 cancelTrade.htm

106 setLiability

../Text/106 setLiability.htm

In the left column are the strings that appear in the text of the manual. Note that there is a potential problem with the word yield. This is a common-enough English word to give problems. In the manual I use this technique for, I have established that no problem exists.

Notice that I called these entries strings. The Index generator, which does the first half of the work, performs a character-for-character match on these strings. Each character must be exactly correct, but the case of the characters does not matter.

In the right-hand column is the tag we want to insert each time an entry in the left column is found. Note that we are going to repurpose an index entry as a hyperlink, so you need to be aware that the strings in the right-hand column are not going to end up as index entries, which is why they do not follow the format of an index entry.

The entry is in two parts: the path and the page name.

For example ../Field_Pages/userType.htm

../Field_Pages/ is the path

userType.htm is the page name.

The entire string forms the relative path from the document to the destination Field Pages folder.

You can have as many paths as you wish, provided you add a section in the macro for each one. In this case there are two: Field_Pages and Text. The macro makes its selection based on the first four characters in the string: either ../F or ../T in this case. In my case, the book also contains legitimate index tags. Since this macro deletes the tags it processes, it is important that these tags are never the same as a legitimate index tag. I retained the two dots of the relative path because no legitimate index tags begin with two dots!

Go ahead and build your concordance file now. An easy way to obtain the page names is to perform a directory listing of the folder on the web server where they are. Go to a command window:
  

1.

Drill down to the folder that stores your field pages or your text.

2.

Run the command: Dir /n *.htm > list.txt

This leaves a text file called list.txt sitting in the directory containing all the page names. The /n parameter places all the file names on the extreme right where they are easy to get at.

3.

Open the list.txt file in Word.

4.

Hold down the Alt key and drag diagonally down to select a vertical block of text up to but not including the file name.

5.

Press Delete to get rid of the material you do not want.

6.

Save the file as a Word document

7.

Select Table>Convert>Text to Table to convert the list to a single-column table.

8.

Copy the resulting column and paste it beside the existing one to produce two columns.

9.

Edit the 

Const hBase As String = "../Text/"

line of the MakeConcordance macro to specify the string you want to use for your path. Ensure the fourth character is unique among your chosen paths.

10.

Run the MakeConcordance macro. This will run through and add the path to the front of the file names in the second column.

11.

To process additional subdirectories of your website, repeat steps 1 through 10 for each directory.

12.

Paste the separate concordance tables into a single table when you have finished, and save the whole thing where you can find it again.

The reason I told you to name your pages after their names in the text of the manual was so that you would not have to edit the first column of the table. If you could not do that, you must now go through and place the actual string you expect to find in the text in the left column against each entry.

Tagging the Document
Make a copy of your original document. You will need a few practice runs to get this right.

Follow the instructions in the Word help to tag the document automatically with index tags.

See the two help topics Create a concordance file and Automatically mark index entries by using a concordance file.

Now eyeball the document to see what happened. You may need to click the Show/Hide toolbar button to reveal the Index tags. Do a quick scan to ensure that you got the correct tags on the correct entries, and that not too many undeserving pieces of text were awarded tags.

Refine your concordance file and repeat this process until you are satisfied that the correct material is being tagged.

Don’t expect perfection: you will get a few misses and a few spurious hits. Live with it. You can take out the extras later: they’re easier to see when they become hyperlinks.

Create the Hyperlinks
Now edit the MakeHyperlinks macro to work with your chosen paths.

The construct

If Left$(url, 4) = "../F" Then
    isHyper = 1

Does two things: It selects an action based on the fourth character of your Path name, and sets the isHyper variable to the number of words to go back.

After the macro deletes the index tag, it needs to extend the selection backwards to select the term in the text before creating the hyperlink. In my case, the page names in the Field_Pages folder are all single words, so it needs to go back only one word, while all the names in the Text folder are two words, so isHyper becomes 2.

Run the macro on your tagged copy of the document and have a look to see what gets highlighted.

Adjust accordingly and have another try.

Republish the Document
Now re-publish your document to your website, using exactly the same method as you used last time so that the page names are not changed.

Voila! Thousands of hyperlinks, automatically inserted, to cross-reference everything in your web with everything else.

Future Issues
You now choose whether to make the tagged document your official document or not.
  

 

If you decide to make it your official document, with the hyperlinks saved in the official source, you will never have to perform this procedure again.

 

However, if the website moves, or as text gets added, changed, or deleted, you will have quite a maintenance effort to keep up with all the hyperlinks.

I prefer to hang onto the concordance file and simply re-run this process for each new issue. It gives you greater flexibility and lower maintenance.

Stripping the HTML
Our Intraweb is also accessible by people working from home on a spluttering dial-up line, so we like to get the pages lean, mean and hungry.

Having published the manual, we save it as a Cascading Style Sheet, then use FrontPage to attach the CSS to every page in the sub web.

We then use HTML Filter 2 from the Microsoft web site, to strip the style sheet Word stores in each web page and a lot of the XML formatting that we do not need. This reduces the size of each page by about 70 per cent.

Here is the batch script we use:

F:
cd "F:\OI_spec\Issue_91"
FOR /R %%i IN (*.xml) DO del "%%i"
FOR /R %%i IN (*.mso) DO del "%%i"
FOR /R %%i IN (*.htm) DO filter -abflmstv "%%i"

The subweb is, of course, in a folder I map to the F drive.

I am afraid I have not yet figured out how to do this in Word 2002. HTML Filter 2 is not available for Word 2002: you have to use VBA to manipulate the output filter, and I don’t think the same abilities are available.