This document is an attempt to describe how I do the indexing for the books that I write. I think the index of a technical book is critical, as I find myself always referring to the index of books that I read, and I get frustrated when I cannot find terms in the index.
Creating the index is part mechanical (adding all the flags to the troff files specifying which terms to index), and then a lot of proofing and tweaking. The first step is to get the software (shell scripts and awk scripts) described in the paper Tools for Printing Indexes by Jon L. Bentley and Brian W. Kernighan, Electronic Publishing, Vol. 1, No. 1, pp. 3-17, 1981, as this is the software that I use.
Here are the steps that I take, looking for terms to index, using UNIX Network Programming, Volume 1, second edition for the example.
.n2 "Multicast Addresses" the contents of the sectionso I added a %begin line after this, and a matching %end at the end of the section.
.n2 "Multicast Addresses" . ix %begin multicast address the contents of the section . ix %end multicast address.ix is my troff macro to generate an index entry and I always put a tab between the period in column 1 and this macro name, to make it easier to spot the index macros in the file.
When using the .CW recvmsg and .CW sendmsg functionsWhen I am done writing a book I go through all of these, and change the second letter of the macro name to a single lower case character, which is another macro of mine that still prints the word in a constant-width font, plus it generates an index entry with an appropriate tag on the end. For example, to generate two index entries for the functions in this example, I change the above to
When using the .Cf recvmsg and .Cf sendmsg functionsThis is shorter and easier to read than the equivalent
When using the .CW recvmsg . ix [recvmsg]~function and .CW sendmsg . ix [sendmsg]~function functionsWhat I actually do is grep for all these .CW macros, manipulate the output with the standard Unix tools, generating an editor script that changes the .CW macros. For UNP v1 this handled about 7,000 of the 9,000+ entries, leaving only about 2,000 macros to go through by hand. The new macros that I define are .Ca for a datatype, .Cb for a structure member, .Cc for a constant, .Cd for a device, .Ce for a error, .Cf for a function or system call, .Ch for a header, .Ci for a file, .Cl for a label, .Cm for a macro, .Cn for a environment variable, .Co for a socket option, .Cp for a program, .Cs for a signal, .Ct for a structure, .Cv for a variable, and .Cx for a XTI option.
.P1 struct ip_mreq { struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */ struct in_addr imr_interface; /* IPv4 addr of local interface */ }; .P2becomes
.P1 struct ip_mreq { struct in_addr imr_multiaddr; /* IPv4 class D multicast addr */ struct in_addr imr_interface; /* IPv4 addr of local interface */ }; . Dt ip_mreq . Db imr_multiaddr . Db imr_interface .P2The macro whose name begins with D produces an index entry that includes ~definition~of at the end.
Once all of the above steps are done (it took me 30 hours to do all of the above for UNPv1), then you generate an index, go through it by hand, fix things, generate another index, go through it again, again and again, until it starts to look reasonable. With UNPv1 this took me 5 complete passes and about 20 hours.
The final steps are just fine tuning this in a pass or two.