Indexing

How I perform indexing

This document is an attempt to describe how I do the indexing for the books that I write. I think the index of a technical book is critical, as I find myself always referring to the index of books that I read, and I get frustrated when I cannot find terms in the index.

Creating the index is part mechanical (adding all the flags to the troff files specifying which terms to index), and then a lot of proofing and tweaking. The first step is to get the software (shell scripts and awk scripts) described in the paper Tools for Printing Indexes by Jon L. Bentley and Brian W. Kernighan, Electronic Publishing, Vol. 1, No. 1, pp. 3-17, 1981, as this is the software that I use.

Here are the steps that I take, looking for terms to index, using UNIX Network Programming, Volume 1, second edition for the example.

I first go through all the chapters and look at the section headings. For example, Section 19.2 begins
```
    .n2 "Multicast Addresses"
    the contents of the section
```
so I added a %begin line after this, and a matching %end at the end of the section.
```
    .n2 "Multicast Addresses"
    .       ix %begin multicast address
    the contents of the section
    .       ix %end multicast address
```
.ix is my troff macro to generate an index entry and I always put a tab between the period in column 1 and this macro name, to make it easier to spot the index macros in the file.
After the numbered section headings (.n2 macros) I go through all the unnumbered section headings (.SH macros) and do the same thing.
Next is to look at all the chapter headings and see if the entire chapter should be indexed. For example, Chapter 19 of UNPv1 contains a %begin multicast at the beginning and a %end at the end.
I use a macro named .CW to print a word in a constant-width font. For example, there are 9,346 of these in UNPv1, as in
```
    When using the
    .CW recvmsg
    and
    .CW sendmsg
    functions
```
When I am done writing a book I go through all of these, and change the second letter of the macro name to a single lower case character, which is another macro of mine that still prints the word in a constant-width font, plus it generates an index entry with an appropriate tag on the end. For example, to generate two index entries for the functions in this example, I change the above to
```
    When using the
    .Cf recvmsg
    and
    .Cf sendmsg
    functions
```
This is shorter and easier to read than the equivalent
```
    When using the
    .CW recvmsg
    .       ix [recvmsg]~function
    and
    .CW sendmsg
    .       ix [sendmsg]~function
    functions
```
What I actually do is grep for all these .CW macros, manipulate the output with the standard Unix tools, generating an editor script that changes the .CW macros. For UNP v1 this handled about 7,000 of the 9,000+ entries, leaving only about 2,000 macros to go through by hand. The new macros that I define are .Ca for a datatype, .Cb for a structure member, .Cc for a constant, .Cd for a device, .Ce for a error, .Cf for a function or system call, .Ch for a header, .Ci for a file, .Cl for a label, .Cm for a macro, .Cn for a environment variable, .Co for a socket option, .Cp for a program, .Cs for a signal, .Ct for a structure, .Cv for a variable, and .Cx for a XTI option.

Next is to go through all the program displays and find any computer terms that need indexing. For example,

    .P1
    struct ip_mreq {
      struct in_addr   imr_multiaddr; /* IPv4 class D multicast addr */
      struct in_addr   imr_interface; /* IPv4 addr of local interface */
    };
    .P2

becomes

    .P1
    struct ip_mreq {
      struct in_addr   imr_multiaddr; /* IPv4 class D multicast addr */
      struct in_addr   imr_interface; /* IPv4 addr of local interface */
    };
    .       Dt ip_mreq
    .       Db imr_multiaddr
    .       Db imr_interface
    .P2

The macro whose name begins with D produces an index entry that includes ~definition~of at the end.

I write italicized strings using a .IT macro, and I go through all of these as often an italicized term should be indexed, some with a ~definition~of appended.
All tables should be examined for terms to index.
Next are the indented paragraphs, which often introduce a term.
All figures are examined for any terms to index.
All 2-character or longer, all uppercase acronyms are found using grep, and the list is printed. You then have to go through the list by hand to find what needs to be indexed.
In the process of doing all of the above steps, you find lots of words that also need to be indexed. I write them down on a sheet of paper as I find them (I normally fill about 3 sheets) and then go through all the chapters finding references to the term, and adding an index entry if appropriate.

Once all of the above steps are done (it took me 30 hours to do all of the above for UNPv1), then you generate an index, go through it by hand, fix things, generate another index, go through it again, again and again, until it starts to look reasonable. With UNPv1 this took me 5 complete passes and about 20 hours.

The final steps are just fine tuning this in a pass or two.

Back to W. Richard Stevens' Home Page