Editor’s note: This version of the post incorporates corrections made by the author to the Options and Advice sections.
Ælfwine Mischler
I recently created an embedded index in Word for a book that will be published as an ebook and in print. I chose to use DEXembed because colleagues advised that its syntax — a space between the curly brackets and the enclosed text — will work better when the text is converted to an ebook.
A quick explanation of an embedded index: For a print book, the index is written after the book has been designed, using a PDF file of the final pages and page numbers as locators. This is changing, and many publishers are now asking for embedded indexes. For an embedded index, the indexer uses something else as locators. Depending on the program used, this could be paragraph numbers, word numbers, or temporary bookmarks. After indexing, the program embeds the entries by inserting field codes that look like this: { XE “main entry:subentry” }. The index is then generated from the field codes so the pages numbers are displayed. In an ebook, they may also be linked to the location in the text. If the book is designed as hardcover and paperback with different pagination, the embedded index entries will give the correct page numbers for each edition.
Embedded indexes are more work for the indexers, so most of us will charge more for an embedded index.
Options in DEXembed
DEXembed (available from the Editorium) is a Word add-on that allows the indexer to use dedicated indexing software rather than Word’s clunky built-in indexing function. DEXembed can use paragraphs, words, or numbers as locators — but only one type in a given document. Paragraph number was the best choice for this project, but the author had sometimes used auto-spacing and other times had used Enter twice between paragraphs. I told him repeatedly that he had to remove the extra Enters and make the spacing between paragraphs consistent (which he did) and that he could not change the paragraphing after I had started indexing. (More on that in the second part of this article in February 2019.)
Experienced colleagues in the Digital Publications Indexing Special Interest Group (DPI SIG) say that Word does not handle ranges of locators well. It is therefore better to mark only the beginnings of entries that are less than two pages long. DEXembed offers three options for ranges: Mark them with bookmarks, mark them with beginning and end codes, or do not mark them. The documentation for DEXembed says that publishers usually prefer begin and end codes.
Before starting my index, I sent two small sample indexes to my author’s publisher — one using bookmarks and one using begin and end codes — and asked which worked better for them. They got better results with the bookmarks, which also meant one less step for me in the end. Hurray!
I Won’t Talk to You
DPI SIG members also advised me that Word and InDesign use different syntax for some things, and I had to take this into consideration while indexing. I also found that my Sky indexing software and Word do not always communicate well.
This index required a separate scripture index of Qur’an verses. In Word, you can use an f-switch that is coded with \f followed by a name to make two indexes at once { “heading1” \f “subject” } and { “heading1” \f “quran” } (See Seth A. Maislin’s blog for more.) However, my colleagues advised that InDesign will reject XE fields with a backslash.
A suggested solution that I followed was to use two levels of subentries, with the main entries for the two indexes. That is, I had only two main entries, for which I used bold text, and my first level of subentry was the real main entry I wanted. The sub-subentry was the real subentry I wanted. The designers can adjust the indentation and spacing to make these appear as two separate indexes:
The chapter and verse numbers presented two other problems of their own. How to write something like 2:10? First, Word signals heading levels with a colon, so I had to use a backslash before the colon to tell Word that this was a literal colon, not a subheading signal. I admit that at that point, I had forgotten the warnings of my colleagues that InDesign would reject these entries.
As of this writing, I am waiting for the author’s comments and corrections, and the results of a small test index for the publisher: three entries using a backslash and colon, and three using a plus sign to be replaced by a colon in the generated index. If I do indeed have to remove \: from the index, I want to be sure that + is not a signal for something else in InDesign.
A second problem in writing chapter and verse numbers was the sorting. I knew that in Sky, I had to enter one- and two-digit chapter numbers with preceding zeros so they would sort properly. Thus, Chapter 2 was entered as 002 and Chapter 16 as 016. The verse numbers following the colons, however, sorted properly in Sky without additional zeros.
Word was not happy with that, but I could only learn that at the end. I finished my index, embedded the entries, generated the index, and then found that Word had mis-sorted the verses so that, for example, 18:70 came before 18:7. I had to open Sky, add the zeros to the verses, re-embed the entries, generate the new index, and remove the extra zeros from the generated index.
Maybe I’ll Talk a Little Bit
Another difference between Sky and Word is how they handle text to be ignored in sorting. Sky’s sorting automatically ignores prepositions at the beginning of subentries, but Word’s does not. Sky also allows the indexer to code other things to be ignored in sorting. I commonly do this with the al- that begins many Arabic names.
For the embedded index, I had to enclose items to be ignored in angle brackets, but then in Sky, they all sorted to the top because they started with symbols. I was not sure that Word would put <al->Bukhari, <al->Ghazali, <al->Tabari, etc., in the proper places in the generated index. On this, I did have success, but I had to go back to the few subentries that begin with prepositions and enclose the prepositions in angle brackets.
DEXembed uses a text file to embed the entries, and all the bold and italics are lost in the process, although their coding remains. Once the entries were embedded, I had to edit the XE fields to get the bold and italic formatting back. (See Sue Klefstad’s blog post for details.) This was not difficult with a Find and Replace using wildcards (but be sure to turn off Tracked Changes!), but it was an extra step to perform.
Advice for Embedded Indexing
It is important to communicate with the author and publisher before beginning an embedded index. Learn how the Word manuscript will be handled after indexing and how it will be published. (There is more information on the resources page of the DPI SIG website.)
Once you have written your index in your dedicated indexing software, always embed in a copy of the document. Always keep the original “clean” and do not embed in it. Sometimes Word does not embed the entries properly and you might have to try again. DEXembed does have a function to remove embedded entries, but if Word gives you run-time errors as it did to me (see the second part of my February 2019 column), you will want to try again in a clean copy so there is no chance of stray coding in the file.
My thanks to colleagues Sue Klefstad and Seth A. Maislin for their invaluable blog posts, and to other colleagues in the DPI SIG for their advice in e-mail messages.
Ælfwine Mischler is an American copyeditor and indexer in Cairo, Egypt, who has been the head copyeditor at a large Islamic website and a senior editor for an EFL textbook publisher. She often edits and indexes books on Islamic studies, Middle East studies, and Egyptology.
EditTools: Duplicate References — A Preview
Tags: Bookmarks, Comment Editor, EditTools, Find Duplicate References macro, Journals macro
The current version of EditTools is nearly 1 year old. Over the past months, a lot of work has gone into improvements to existing functions and in creating new functions. Shortly, a new version of EditTools will be released (it will be a free upgrade for registered users).
New in the forthcoming version is the Find Duplicate References macro, which is listed as Duplicate Refs on the References menu as shown here:
Duplicate Refs on the References Menu
The preliminaries
The macro works with both unnumbered and numbered reference lists (works better when the numbers are not autonumbers, but it does work with autonumbered lists). It also works with the reference list left in the manuscript with the text paragraphs and when the reference list has been moved temporarily to its own file (it works, like other reference-specific macros in EditTools, better when the references are moved to a separate, references-only file).
Like all macros, the Find Duplicate References macro is “dumb”; that is, it only finds identical references. The following image shows references 19 and 78 as submitted for editing. (For all images in this essay: For a larger, more readable image, right-click on the image and click “Open link in new tab.” This will open a larger version of the image in a new tab that can be kept open as you read the description of the image.)
Original References
As the image shows, although references 19 and 78 are identical references and are likely to appear identical to an editor, they will not appear identical to the Find Duplicate References macro. Items 1 and 2 show a slight difference in the author name (19: “Infant”, 78: “Infantile”). The journal names are different in that in 19 the abbreviated name is used (#3) whereas in 78 the name is spelled out (#4). Finally, as #5 and #6 show, there are a couple of differences in the cite information, namely, the order, the use of a hyphen or en-dash to indicate range, and the final page number.
Because any one of these differences would prevent the macro from pairing these references and marking them as potentially identical, it is important that the references go through a round of editing first. After editing, which for EditTools users should also include running the Journals macro, the references are likely to look like this:
The References After Editing
If you compare the same items (1 and 2, 3 and 4, 5 and 6) in the above image, you will see that they now better match. (Ignore the inserted comments for now; they are discussed below.) One more step is required before the Find Duplicate References macro can be run — you need to accept all of the changes that were made. Remember that in Word, when changes are made with Tracking on, the material marked as deleted is not yet deleted; consequently, when the macro is run, the Tracked items will interfere (as will any comments, which also need to be deleted). The best method is to (1) save the tracked version, (2) accept all the changes, (3) use EditTools’ Comment Editor to delete any comments, and (4) save this clean version to run the Find Duplicate References macro.
After accepting all changes and deleting the comments, the entries for references 19 and 78 look like this:
The References After Changes Accepted
Running the macro
When the Find Duplicate References macro is run, the following message box appears.
Find Duplicate References Message Box
To run the macro, the macro has to be told where to begin and end its search. If the references are in a separate file from the rest of the manuscript, check the box indicating that the references are in a standalone document (#5) and click Run (#6). If the references are in a file with other material, use bookmarks to mark the beginning and ending of the list as instructed at the top of the message box (#1). To make it easier, the Bookmarks macro now has buttons to insert these bookmarks:
The dupBegin and dupEnd Bookmark Insert Buttons
The Find Duplicate References macro matches a set number of characters, including spaces. The default is 120 (#4) but you can change the number to 36, 48, 60, 72, 84, 96, or 108 using the dropdown arrow shown at #4 in the Find Duplicate References message box above.
The macro does a two-pass search, one from the beginning of the reference and another from the end of the reference, which is why a list of duplicates may have repetitions.
The results of the search appear like this:
List of Possible Duplicate References
(They appear as tracked changes only if the macro is run with Tracking on; if Tracking is off, the results appear as normal text.) Note the title of the duplicates is “Duplicate Entries (Nondefinitive).” The reason for “Nondefinitive” is to remind you that the macro is “dumb” and there is no guarantee that the list includes all duplicates or that all listed items are duplicated. Much of the macro’s accuracy depends on the consistency of editing, including formatting.
For the examples in this essay, the Find Duplicate References macro was run on a list of 735 references and the list of possibilities shown represents those likely duplicate references the macro found. Note that references 19 and 78 were found (#19 and #78 indicate the portions of those references found duplicated by each pass of the macro); however, if, for example, in editing the page range separator in #19 was left as an en-dash in reference 19 and in reference 78 as a hyphen, the macro would not have listed the material at #19 as there would not have been a match. Similarly, if the author name in reference 19 had been left as “Infant” and in reference 78 as “Infantile”, the macro would not have listed the material at #78 as there would not have been a match.
The next step is for the editor to determine which of the listed possibilities are duplicates. This is done using Word’s Find Navigation pane, as shown here:
Verifying Duplicate References
Copy part or all of what was found (#1) into the Find field (#2). Find will display the search results (“3 matches”) (#3); clicking the Browse button (the rightmost button at #3) lists the three matches found (#4 to #6). The first entry (#4) is always the text in the duplicates list (#1), which means that, in this example, the possible duplicates are #5 and #6. Clicking on the text marked #5 to see the complete text of that entry. Then compare that text to the text of the reference at #6. (It is possible for the macro to find more than two possible matches for the same text — and all, some, or none may be duplicates.)
The Inserted Comment
When editing of the manuscript is finished, have the Reference Number Order Check macro export a renumbering report to send with the edited file to the client. A partial sample report is shown here:
Sample Partial Renumbering Report
Every report bears the creator’s identification information (#1) and file title (#2). You set the creator information once and it remains the same for every report until you change it using a manager. The file title is set each time you create a report.
As the report shows, reference 78 was deleted and all callouts numbered 78 were renumbered as 19 (#3). The prewritten, standard message (a new feature) can be inserted with a mouse click; only the numbers need to be inserted or modified. The report shows that the renumbering stopped at callout 176 (#4) and started again at 197 (#5). Number 6 shows another deletion and renumbering.
Clients like these reports because it makes it easy for authors, proofreaders, and others involved in the production process to track what was done.
The Find Duplicate References macro is a handy addition to EditTools. While it is easy in very short reference lists to check for duplicate references, as the number of references grows, checking for duplicates becomes increasingly difficult and time-consuming. The Find Duplicate References macro saves a lot of time, thereby increasing an editor’s profits.
Richard Adin, An American Editor
Share this:
Like this: