An American Editor

March 18, 2019

Book Indexes: Multivolume Indexes

Ælfwine Mischler

Last year, I had the pleasure of indexing the third and final volume of a history of Egyptology while creating a combined index of volumes 1–3. (I confess that I have a bit of a soft spot for this book. Volume 1 was my first paid index — and a complicated text for a first-timer — and I was thrilled that the author included me in the acknowledgments.)

When I indexed volumes 1 and 2, the publisher had not thought of having a combined index in the last volume, so I did nothing out of the ordinary in indexing the first two books. When the publisher asked for a combined index, I asked colleagues for any tips or tricks, and they alerted me that it would be a lot more work than just merging the first two files into the third. They were not joking! (Fortunately, I was able to negotiate a higher per-page price.)

More Editing

The publisher gave me PDFs of the final indexes for volumes 1 and 2, and I compared these carefully with the indexes I had written. I wanted to see any changes the publisher had made and refresh my memory of both the subjects I had indexed and their organization.

In my indexing software (I use Sky Indexing), I made a copy of each volume’s index and entered the publisher’s edits, and then increased the locator numbers (a locator is a page or a range of pages) by 1000 in volume 1 and by 2000 in volume 2. Thus, for example, page 35 in volume 1 became 1035 and page 35 in volume 2 became 2035. I then merged these into a new file, in which I indexed volume 3. I changed the page numbers to the correct forms with volume numbers as a final step so I would not have to type 3: before every locator for the new items.

The real extra work came in creating and organizing subentries. Many entries in volumes 1 and 2 had only a few locators without subentries. When the indexes were combined, these entries had too many locators and I had to make subentries. This required going back into the PDFs for those volumes and rereading those pages.

Other entries in a single volume had subentries, but there were so many in the combined index that they became unwieldy. I reworded some subentries to combine them, but more often, I put the subentries into broad categories and split them into nested entries.

Two of the great names in Egyptology illustrate this editing process.

Howard Carter, who discovered Tutankhamun’s tomb in 1922, had only five locators in the index of volume 1 (covering from antiquity to 1881), with no subentries. In volume 2 (covering from 1881–1941), he had 14 subentries. In volume 3, the discovery of Tutankhamun covered 40 pages in two chapters, and in the combined index, Carter was in two nested entries:

Sir Flinders Petrie also appeared in all three volumes. In the volume 1 index, he had seven locators with no subentries. (In volume 1, which covered a much longer span of time, the managing editor and I agreed to use longer strings of locators to save space.) In volume 2, there were 26 subentries for Petrie. In volume 3, the Petrie entry was nested to break the subentries into broad categories:

I made some other changes in the combined index. Many of the big names had a subentry “career” or “early career” or “legacy.” These were all force-sorted as the first subentry under the name. Volume 1 discussed many books. I reviewed these entries and removed some from the index that were mentioned with little or no discussion. This was relatively easy to do in the indexing software because I could group all the records that had italics in the main entry. If a book had only one locator, I reread the page in the PDF. Sometimes there was sufficient discussion to keep the title in the index. In addition to these smaller edits, I reorganized some of the large entries.

When I was finished with the editing, I changed the locator numbers to volume and correct page number — an easy task in the software.

This long, complicated index needed a final check. For this, I generated a page proof in numbered order. (This option may not be available in all indexing software.) I went through the page proof line by line. This allowed me to check that double-posted items were correct; for example, that 2:17–20 appeared in both “Petrie, Sir William Flinders, methods and techniques of: excavation” and “methods and techniques of archaeologists: of Petrie.”

With a Heads-up

In this situation, I did not know that a combined index would be required in the last volume of the series when I worked on the first two volumes. What if I had a heads-up on another project? What would I do differently in indexing the early volumes?

I would create subentries for anything that was likely to appear in the following volumes, even if it did not require subentries in the current volume. When I was finished editing the index with the extraneous subentries, I would suppress them in the current index, saving them for the later combined index.

This could be done in one of two ways. I could save the index with a different name, and then in the new one, consume the extraneous subentries, that is, remove the subentries but retain the locators, which my software can easily do. When I made the combined index, I would merge the file with the subentries into the new file.

Or I could duplicate each of the entries with extraneous subentries in one file, label them with a color code and filter them out, and then consume the subentries in the unfiltered records. To make the combined index, I would unfilter the records with subentries.

Either way, the combined index would still be more work than a single-volume index, and I would charge a higher rate. A combined index is more than the sum of its parts. Be aware of this if you are either of the parties negotiating for such an index.

Ælfwine Mischler is an American copyeditor and indexer in Cairo, Egypt, who has been the head copyeditor at a large Islamic website and a senior editor for an EFL textbook publisher. She often edits and indexes books on Islamic studies, Middle East studies, and Egyptology.

Advertisement

December 24, 2018

Indexes: Part 7 — Lessons Learned in Using DEXembed for the First Time

Editor’s note: This version of the post incorporates corrections made by the author to the Options and Advice sections.

Ælfwine Mischler

I recently created an embedded index in Word for a book that will be published as an ebook and in print. I chose to use DEXembed because colleagues advised that its syntax — a space between the curly brackets and the enclosed text — will work better when the text is converted to an ebook.

A quick explanation of an embedded index: For a print book, the index is written after the book has been designed, using a PDF file of the final pages and page numbers as locators. This is changing, and many publishers are now asking for embedded indexes. For an embedded index, the indexer uses something else as locators. Depending on the program used, this could be paragraph numbers, word numbers, or temporary bookmarks. After indexing, the program embeds the entries by inserting field codes that look like this: { XE “main entry:subentry” }. The index is then generated from the field codes so the pages numbers are displayed. In an ebook, they may also be linked to the location in the text. If the book is designed as hardcover and paperback with different pagination, the embedded index entries will give the correct page numbers for each edition.

Embedded indexes are more work for the indexers, so most of us will charge more for an embedded index.

Options in DEXembed

DEXembed (available from the Editorium) is a Word add-on that allows the indexer to use dedicated indexing software rather than Word’s clunky built-in indexing function. DEXembed can use paragraphs, words, or numbers as locators — but only one type in a given document. Paragraph number was the best choice for this project, but the author had sometimes used auto-spacing and other times had used Enter twice between paragraphs. I told him repeatedly that he had to remove the extra Enters and make the spacing between paragraphs consistent (which he did) and that he could not change the paragraphing after I had started indexing. (More on that in the second part of this article in February 2019.)

Experienced colleagues in the Digital Publications Indexing Special Interest Group (DPI SIG) say that Word does not handle ranges of locators well. It is therefore better to mark only the beginnings of entries that are less than two pages long. DEXembed offers three options for ranges: Mark them with bookmarks, mark them with beginning and end codes, or do not mark them. The documentation for DEXembed says that publishers usually prefer begin and end codes.

Before starting my index, I sent two small sample indexes to my author’s publisher — one using bookmarks and one using begin and end codes — and asked which worked better for them. They got better results with the bookmarks, which also meant one less step for me in the end. Hurray!

I Won’t Talk to You

DPI SIG members also advised me that Word and InDesign use different syntax for some things, and I had to take this into consideration while indexing. I also found that my Sky indexing software and Word do not always communicate well.

This index required a separate scripture index of Qur’an verses. In Word, you can use an f-switch that is coded with \f followed by a name to make two indexes at once { “heading1” \f “subject” } and { “heading1” \f “quran” } (See Seth A. Maislin’s blog for more.) However, my colleagues advised that InDesign will reject XE fields with a backslash.

A suggested solution that I followed was to use two levels of subentries, with the main entries for the two indexes. That is, I had only two main entries, for which I used bold text, and my first level of subentry was the real main entry I wanted. The sub-subentry was the real subentry I wanted. The designers can adjust the indentation and spacing to make these appear as two separate indexes:

The chapter and verse numbers presented two other problems of their own. How to write something like 2:10? First, Word signals heading levels with a colon, so I had to use a backslash before the colon to tell Word that this was a literal colon, not a subheading signal. I admit that at that point, I had forgotten the warnings of my colleagues that InDesign would reject these entries.

As of this writing, I am waiting for the author’s comments and corrections, and the results of a small test index for the publisher: three entries using a backslash and colon, and three using a plus sign to be replaced by a colon in the generated index. If I do indeed have to remove \: from the index, I want to be sure that + is not a signal for something else in InDesign.

A second problem in writing chapter and verse numbers was the sorting. I knew that in Sky, I had to enter one- and two-digit chapter numbers with preceding zeros so they would sort properly. Thus, Chapter 2 was entered as 002 and Chapter 16 as 016. The verse numbers following the colons, however, sorted properly in Sky without additional zeros.

Word was not happy with that, but I could only learn that at the end. I finished my index, embedded the entries, generated the index, and then found that Word had mis-sorted the verses so that, for example, 18:70 came before 18:7. I had to open Sky, add the zeros to the verses, re-embed the entries, generate the new index, and remove the extra zeros from the generated index.

Maybe I’ll Talk a Little Bit

Another difference between Sky and Word is how they handle text to be ignored in sorting. Sky’s sorting automatically ignores prepositions at the beginning of subentries, but  Word’s does not. Sky also allows the indexer to code other things to be ignored in sorting. I commonly do this with the al- that begins many Arabic names.

For the embedded index, I had to enclose items to be ignored in angle brackets, but then in Sky, they all sorted to the top because they started with symbols. I was not sure that Word would put <al->Bukhari, <al->Ghazali, <al->Tabari, etc., in the proper places in the generated index. On this, I did have success, but I had to go back to the few subentries that begin with prepositions and enclose the prepositions in angle brackets.

DEXembed uses a text file to embed the entries, and all the bold and italics are lost in the process, although their coding remains. Once the entries were embedded, I had to edit the XE fields to get the bold and italic formatting back. (See Sue Klefstad’s blog post for details.) This was not difficult with a Find and Replace using wildcards (but be sure to turn off Tracked Changes!), but it was an extra step to perform.

Advice for Embedded Indexing

It is important to communicate with the author and publisher before beginning an embedded index. Learn how the Word manuscript will be handled after indexing and how it will be published. (There is more information on the resources page of the DPI SIG website.)

Once you have written your index in your dedicated indexing software, always embed in a copy of the document. Always keep the original “clean” and do not embed in it. Sometimes Word does not embed the entries properly and you might have to try again. DEXembed does have a function to remove embedded entries, but if Word gives you run-time errors as it did to me (see the second part of my February 2019 column), you will want to try again in a clean copy so there is no chance of stray coding in the file.

My thanks to colleagues Sue Klefstad and Seth A. Maislin for their invaluable blog posts, and to other colleagues in the DPI SIG for their advice in e-mail messages.

Ælfwine Mischler is an American copyeditor and indexer in Cairo, Egypt, who has been the head copyeditor at a large Islamic website and a senior editor for an EFL textbook publisher. She often edits and indexes books on Islamic studies, Middle East studies, and Egyptology.

October 15, 2018

Indexes — Part 5: Names in Indexes

Ælfwine Mischler

A potential client recently asked me what an index is. Does it contain every name and event in a book? How is it different from a concordance?

A concordance maps every occurrence of words in a work or corpus, usually with the surrounding words to provide some context. A concordance might categorize the words by parts of speech (noun, verb, adjective, adverb) or by form (run, running, runny). There are, for example, concordances for the Bible, Shakespeare, Old English literature (which has a limited corpus), and the Qur’an (in Arabic). For most books, though, a concordance is not very useful.

Imagine a book about aardvarks — do you really want to know where every occurrence of the word aardvark is? Wouldn’t you rather want to know where to find information about the diet, habitats, mating habits, diseases, and natural enemies of aardvarks? That is what a well-written index provides. Indexers create entries for the topics discussed in a book and — if they do the job right — break long topics into subentries so readers can easily find what they want. Nobody wants to check all the pages in a long string of page numbers (or other locators) to find particular information.

What about names of people — should every instance of every name appear in an index?

Not usually. A computer-generated index might pick out all the words beginning with a capital letter and index them without differentiating between those that are passing mentions and those attached to substantial information. If a page says that Fay Canoes went with Bob Zurunkel, and that Fay did X, Y, and Z, and Fay said “yadda yadda” and “blah blah blah,” Fay is going to be indexed for that page, but not Bob. He is just a passing mention there. If Fay appears many times in the book, a human-produced index will usually have subentries for Fay, but a computer-generated index will not.

Often, a trade book or one that has limited space for the index will have longer strings of locators — and, thus, fewer subentries — and fewer details in the index.

As I said, usually not every occurrence of every name will appear in the index. There are exceptions, of course, and indexers should anticipate the needs of the reader. For example, in local histories, even passing mentions of every person or place (building, street, town, etc.) should be indexed because they might serve as clues for later researchers. In a handbook of literature, every author’s name might be indexed even if they are only mentioned in passing, but book titles might be indexed only if there is substantial discussion of them. What constitutes “substantial discussion” is sometimes a subjective decision.

Authors used as sources may or may not be indexed, and practice varies from one field to another. In the social sciences, it is common to have a separate name/author index that includes all sources, even if they are named only in parentheses, without subentries. The indexer has to refer to the bibliography to get the first name or initial(s) of authors, so bibliography pages should be counted in the page or word count used for pricing the index.

In other works, sources might be indexed only if there is substantial discussion of their material, or only if the source name appears in the text as opposed to only in a footnote or endnote. Authors and editors should make their expectations clear to the indexer before indexing begins.

Human indexers can decide which names to include in an index. They can also index people with nicknames properly (e.g., recognize that Frank and Buddy are the same person), people whose names have changed over time, and people who are referred to by a title or family relationship. A computer program will not index such people correctly, if at all.

So what goes into an index? That depends on the nature of the book, needs of the reader, practice in a given field, and space available for the index. If you have particular needs or questions, discuss them with your indexer before work begins. If you are the indexer, be sure to have this conversation before you begin the work.

Ælfwine Mischler is an American copyeditor and indexer in Cairo, Egypt, who has been the head copyeditor at a large Islamic website and a senior editor for an EFL textbook publisher. She often edits and indexes books on Islamic studies, Middle East studies, and Egyptology.

July 16, 2018

Book Indexes — Part 3: The ABCs of Alphabetizing

Ælfwine Mischler

The alphabetizing I learned in school so many years ago — all before PCs and the Internet, of course — was easy. Go by the first letters — Bincoln, Fincoln, Lincoln, Mincoln — and if they’re all the same, look at the second, then the third, etc. — Lankin, Lanky, Lenkin, Lincoln, Linkin. I rarely had to alphabetize anything outside of school assignments (I did not organize my spices alphabetically), but I had to understand alphabetization to find a word in a dictionary, a name in a phone book, a card in a library catalog, or a folder in a file cabinet. Hunting for an organization or business whose name was just initials or began with initials was sometimes tricky, but I soon learned that if I did not find something interspersed with other entries, I could look at the beginning of that letter.

As an indexer, I have to know the conventions of alphabetizing so I can enter terms in the software program, and like so many other things in editorial work, there are different standards to follow. There are two main systems of alphabetizing — word-by-word and letter-by-letter — with some variations within each system. If you are writing an index or hiring an indexer, you have to know which system the publisher uses. Occasionally an indexer might find, in the midst of a project, that switching to the other system would be better, but this must be cleared with the publisher.

Word by Word

In the word-by-word system, generally used in indexes in Great Britain, alphabetizing proceeds up to the first space and then starts over. According to New Hart’s Rules, 2nd ed., hyphens are treated as spaces except where the first element is a prefix, not a word on its own (p. 384). However, the Chicago Manual of Style, 17th ed., treats hyphenated compounds as one word (sec. 16.60).

Letter by Letter

Most US publishers prefer the letter-by-letter system, in which alphabetizing continues up to the first parenthesis or comma, ignoring spaces, hyphens, and other punctuation.

If you are writing your own index in a word processing program, it will use word-by-word sorting. Dedicated indexing software can use either system along with variations. The following table comparing these systems uses Microsoft Word and SKY Indexing Software with various settings. (The items in the table were chosen to demonstrate how the different systems handle spaces, hyphens, commas, and ampersands. Not all of them would appear in an index. The variations on Erie-Lackawanna, for example, would normally have another word, such as “Rail Road,” following them.)

 

Entries with Same First Word

In the first edition of New Hart’s Rules, names and terms beginning with the same word were ordered according to a hierarchy: people; places; subjects, concepts, and objects; titles of works. You may see this in older books, and it occasionally comes up in indexers’ discussions. However, the second edition of New Hart’s Rules recognizes that most people do not understand this hierarchy and that alphabetizing this way is more work for the indexer. The second edition (p. 385) recommends retaining the strict alphabetical order created by indexing software.

Numbers Following Names

Names and terms followed by numbers are not ordered strictly alphabetically. These could be rulers or popes, or numbered articles or laws, etc. An indexer with dedicated software can insert coding to force these to sort correctly. If you are writing your own index in a word processor, you will have to sort these manually.

When people of different statuses — saints, popes, rulers (perhaps of more than one country), nobles, commoners — share a name, these have to be sorted hierarchically. See New Hart’s Rules, 2nd ed., section 19.3.2, and Kate Mertes, “Classical and Medieval Names” in Indexing Names, edited by Noeline Bridge.

Numerals and Symbols at the Beginning of Entries

Entries that begin with numerals or symbols may be sorted at the top of the index, before the alphabetical sequence. This is preferred by the International and British Standard, and when there are many such entries in a work. Alternatively, they may be interspersed in alphabetical order as if the numeral or symbol were spelled out, and they may be also be double-posted if they appear at the top of the index.

However, in chemical compounds beginning with a prefix, Greek letter, or numeral, the prefix, Greek letter, or numeral is ignored in the sorting.

Greek letters prefixing chemical terms, star names, etc., are customarily spelled out, without a hyphen (New Hart’s Rules, 2nd ed., p. 389).

If you are writing your own index in a word processing program, you will have to manually sort entries with Greek letters or prefixes to be ignored, and entries beginning with numerals if you do not want them sorted at the top. Dedicated indexing programs can be coded to print but ignore items in sorting, or to sort numerals as if they were spelled out.

That’s Not All, Folks

This is just the beginning of alphabetizing issues that indexers face. While most of the actual alphabetizing is done by the software, indexers have to know many conventions regarding whether names are inverted; how particles in names are handled; how Saint, St., Ste. and Mc, Mac, Mc in surnames are alphabetized (styles vary on those); how to enter names of organizations, places, and geographical features. In addition to checking the books mentioned above, you can learn more about indexing best practices and indexing standards on the American Society for Indexing website and from the National Information Standards Organization.

Ælfwine Mischler is an American copyeditor and indexer in Cairo, Egypt, who has been the head copyeditor at a large Islamic website and a senior editor for an EFL textbook publisher. She often edits and indexes books on Islamic studies, Middle East studies, and Egyptology.

May 21, 2018

Book Indexes — Part 1: Basic Vocabulary

Filed under: Contributor Article,Editorial Matters,indexing — Rich Adin @ 9:08 am

Ælfwine Mischler

When I tell people that I am a copyeditor and indexer, they usually have some idea of what an editor is (if not specifically a copyeditor), but they ask what an indexer is. I am not alone here; most indexers have the same problem. This series is about book indexes (print and ebooks), but there are also indexes for databases, websites, archives, and journals.

An index is an alphabetized list of keywords with (usually) page numbers to guide the reader to the information in the book (whether that be a single-volume or multi-volume text). An index is usually at the back of a book, but for a multi-volume text, it may be in a separate volume.

What an index is not is a concordance. An index does not list every occurrence of every name or word in the text.

If you are an author or editor looking to hire an indexer, it helps if you are all speaking the same language. Here are some basic terms that will pop up in a conversation about your index.

Locators

Indexers use locator rather than page number. While the locators are page numbers in most books, in a multivolume work, locators are volume and page numbers. Locators might be numbered sections or paragraphs in a reference book, map and grid numbers in an atlas, or product numbers in a catalog. Locator can also refer to a range to indicate that the topic is discussed on adjacent pages; thus, 23–25 indicates that a discussion is on three pages but is one locator. A string is three or more locators for the same main entry or subentry.

Type of Index Based on Arrangement

One of the first questions an indexer will ask you is whether you want your index to be run-in or indented. This refers to how the subentries are arranged relative to the main entry.

Run-in indexes are usually found in scholarly books where a lot of details are indexed. They take up less space, but are harder to scan with the eye. Indented indexes, which are easier to scan, are usually found in trade and children’s books.

Each box contains one entry. This entry has the main entry, tomb(s), followed by 11 subentries. Each subentry is followed by one or more locators. I have labeled the string of four locators after plundering, and the page range after in Tura. The subentry Montemhet has a gloss (TT 34) that further identifies the tomb as Theban Tomb 34. In this case the gloss was given in the text by the author. Indexers occasionally add glosses where clarification is needed — for example, to differentiate between two people with the same name.

This one entry has 11 subentries and 20 locators — each page or page range is a locator. In my indexing file, there are 20 records for this one entry, one record for each locator. It is important to understand this meaning of entry because in some types of indexes, the indexer is paid by the number of entries (rather than by the more usual page count or word count). If that were the case here, I would consider the text in the illustration to be 20 entries, not one, and the client and I probably would disagree. If you are writing or commissioning an index that will be paid by the number of entries, make sure that the two parties fully understand and agree on what an entry is before work begins.

Number of Levels of Subheads

An indexer will also ask you how many levels of subheads you will allow. The publishers I work for most often allow only one level, as shown in the above example, but occasionally they allow two. Some kinds of specialized indexing require many levels of subheads. The number of levels affects how the information is organized.

Undifferentiated Locators

If there are more than a given number of locators in a string (usually five to seven), it is best to differentiate them by creating subheads. A long string of locators is next to useless for the reader. Some publishers are strict about limiting the number of locators in a string, and this must be communicated to the indexer at the beginning of the project.

Sometimes publishers do not leave an adequate number of pages for the index so there is insufficient space for subheads. This is often seen in trade books, but unfortunately it is becoming more common in scholarly books. If space is short, the indexer will have to create longer strings of undifferentiated locators.

Cross References

The two most common types of cross references in indexes are See and See also. Indexers use See cross references when there is more than one term for a concept, or more than one name for a person. These tell the readers which word to look up to find the information. In this example, readers who go to Arab Spring are told to go to Revolution of 2011, which is the term the author uses.

Indexers use See also cross references to guide the readers to other topics related to the current one. In this example, page 115 explains how the misnomer “solar boat” came to be used. Under Khufu Boat Museum, readers will find more information about the boat itself and its preservation.

See also cross references can go before or after the locators. As the author, you must communicate that preference to the indexer.

One more term to understand is double post. If there is more than one term for a concept (so that a See cross-reference would be expected for one of them) and only a very few locators for it, indexers might list the locators under both terms rather than using a See cross-reference. This is considered good practice because the reader does not have to flip from one page to another, and it might actually take less space to print the locators than the other term. In this example, the double post does, in fact, take less space than the See cross reference.

Indexers also use double posting to create multiple access points for the reader. All or some of the names and terms that are subentries in one place become their own main entries elsewhere. This is called breaking out and is good practice. In the first example in this essay, all of the subentries become main entries elsewhere. Note that plundering and restorations have their own subentries, and Tura has an additional locator that is not related to tombs and thus did not appear when Tura was a subentry under tomb(s).

If space is limited, indexers use less double posting. For example, if space were limited in this case, I would make separate entries for the tombs of Bakenrenef, Horemheb, Maja, Montemhet (TT 34), Sekhemkhet, and Thery, but not include them as subentries under tombs. I would add See also tombs of individuals under their names.

Just the Beginning

You now have some basic vocabulary so you can communicate with an indexer about your book. In other segments, I explain how we create indexes (Hint: We don’t use magic wands, and the computer does not do it for us) and what you can expect in an index.

Ælfwine Mischler is an American copyeditor and indexer in Cairo, Egypt, who has been the head copyeditor at a large Islamic website and a senior editor for an EFL textbook publisher. She often edits and indexes books on Islamic studies, Middle East studies, and Egyptology.

Create a free website or blog at WordPress.com.

%d bloggers like this: