An American Editor

April 3, 2017

The Business of Editing: The AAE Copyediting Roadmap VII

My approach to editing began with creating a stylesheet and cleaning extraneous and unwanted typing mistakes from the document (see The Business of Editing: The AAE Copyediting Roadmap II), moved through tagging the manuscript by typecoding or applying styles (see The Business of Editing: The AAE Copyediting Roadmap III) and inserting bookmarks for callouts and other things I noticed while tagging the manuscript (see The Business of Editing: The AAE Copyediting Roadmap IV), to creating the project- or client-specific Never Spell Word dataset and running the Never Spell Word macro (see The Business of Editing: The AAE Copyediting Roadmap V). The last stop was using wildcards to fix reference formatting problems, running the Journals macro to correct incorrect journal names, and editing the reference list (see The Business of Editing: The AAE Copyediting Roadmap VI). Now it’s time to tackle duplicate references using the Find Duplicate References macro.

Ancient History

Until recently, finding duplicate references was difficult and very time-consuming. I often deal with reference lists of 300+ references, with many lists running between 500 and 800 references and some running close to 2000 references. Before I created the Find Duplicate References macro, the only way I had to check for duplicate references in a numbered reference list was to use Word’s Find and do two or three different searches based on the same reference. One search might be on author names, another on article or book title, and a third on the cite information (i.e., journal name, year, volume, and pages). Unfortunately, many authors are sloppy with how they cite references so that the same reference is cited slightly differently each time it is cited. Sometimes a reference is cited completely, other times a reference is missing material.

Careful editing of references solved part of the problem, but duplicates of each reference still had to be searched for individually. Time — and profit — flew away.

Today’s Approach: Find Duplicate References

The process was taking too much time and costing me too much profit. I needed a better solution, which led to EditTools’ Find Duplicate References (FDR) macro, a much quicker and better solution to the problem of finding duplicate references.

As good as FDR is at finding duplicates (and from the heavy use I gave it in my last project, which project had more than 21,000 references in total, I know it is very good), it is important to remember that FDR, like other macros, is dumb — it will find only exactly what it is told to find, not something close. If two entries in the reference list are identical except that one has an extra space, FDR might not tag them as possible duplicates because of that extra space. Similarly, if the references are identical except that one uses a colon to separate portions of the article title and the other uses a dash, they will not be tagged as duplicates. Even the dashes have to be identical. For example, a page range that is identical except that one uses a hyphen as a separator and the other an en-dash will result in the cites not being tagged as identical. Again, close only counts in horseshoes.

Tip: Because the macro looks for matches within set number of characters, it is occasionally worthwhile to run the macro more the once using a different number of characters as the search parameter. The macro lets you choose 24, 36, 48, 60, 72, 84, 96, 108, or 120 as the number of characters in the search string. Consequently, if you choose 96 characters and two references are identical except that one uses an em-dash and the other uses an en-dash in the article title, the macro will not find them as duplicates. (This is why the macro does two passes — one from the beginning of the reference forward and one from the end backward — in case there is a match from one direction even if not from both directions.) Changing the search string length to 72 might find the duplicates if the dashes appear as the 73rd character or later. Of course, it may still not find the duplicates if the opposing characters still appear within the search string length. The macro is dumb; the characters within the search string must be identical.

Moving References

The dialog that appears when FDR is run, which is shown below (you can make the image, as well as other images in this essay, larger by clicking on the image), provides detailed information about the macro. As I noted previously, it is my habit to move the reference list to its own file. I do this for several reasons: First, the Journals macro runs more quickly because there is less material it needs to check.

Second, if the manuscript requires my using Superscript Me (see The Business of Editing: The AAE Copyediting Roadmap VI), it eliminates the possibility that Superscript Me will make unwanted changes to reference cites (e.g., changing 1986;52(14):122 to 1986;5214:122 or 1986;5214:122).

Third, it makes it easier to renumber and/or add or delete references during editing of the manuscript (I use three monitors and have found it is easier and quicker to access and edit the references when the text is open on one monitor and the reference list is open on a second monitor).

Fourth, the Find Duplicate References macro does several things to the document on which it will run before running, namely, save the current document, create a copy of the current document, remove any highlighting and queries/comments, and accept all changes (see #1 in below image). The idea is that the duplicates will be found in the copy document but the editor will note them in the original document, which is the document that the client will see.

The Find Duplicate References dialog

The longer the reference list, the more important I think it is that the reference list is moved to its own document. (When I am done editing the manuscript, I reincorporate the reference list in the edited document. I turn tracking off in the manuscript and use Word’s Insert Text from File feature to reinsert the reference list with all its tracked changes. I then turn tracking back on in the manuscript and save the file.) But if you do not want to move the references, you can leave them where they are and use the Bookmarks buttons (see #4 below) to insert the required dupBegin and dupEnd bookmarks at the beginning and end, respectively, of the reference list. (These bookmarks are not needed if the reference list is in its own file.)

The FDR Bookmarks

Making Ready

The key to the Find Duplicate References macro is remembering that the macro only identifies information that is identical (see #2 in the FDR dialog image above). Consequently, after running the Journals macro and before running FDR, I edit the reference list, making the references consistent. All page ranges, for example, use an en-dash; every time the CDC is named as the author, the name is conformed to “Centers for Disease Control and Prevention (CDC).” In addition, I check URLs and add any missing information.

Finally, I tell the macro the number of characters I want it to match (see #3 in the FDR dialog image above; also see the Tip above). Because the macro is a two-pass macro, it will check that number of characters (including spaces) from the beginning of the reference forward (it ignores the reference number and begins the count from the first alphanumeric character that is part of the cite itself) and then from the end of the reference backward.

The Report

When done, FDR produces a “report” that it places at the beginning of the reference list that looks like this:

The FDR Report

When the report is generated I use Word’s Find Pane to check each entry. (For a more in-depth discussion of the process, see Find Duplicate References at the wordsnSync website.)

Marking Duplicates

As I go through the list of possible duplicates, I mark those I find that are duplicates. However, I do not want to make changes to the reference list at this point; I just want to mark the duplicates. To mark them, I do two things: First, I insert a standard comment using the Insert Query macro, replacing the underscore with the numbers of the references that are duplicates of the current reference:

Marking a Duplicate with Insert Query

I also insert a bookmark at each location using the Bookmarks macro. I use this format (see the highlighted text):

Using Bookmarks with Duplicate Cites

The bookmarks act as a check, as well as make it easier to deal with the duplicate references. When, for example, during editing of the text I come to the callout for reference 18, I can see — from the comment and the bookmarks — that three other references are identical to 18, namely references 72, 91, and 102. Should the author have numbered references out of order and called out reference 91 before 18, I can see at a glance which references are duplicates. The bookmarks let me easily navigate to each of the duplicate references; once I have deleted a duplicate reference, I can delete its bookmark. The bookmarks provide an easy way to track which duplicates remain.

Recording & Reporting Duplicates

I also mark the information in the Reference Number Order Check macro (which is the subject of The Business of Editing: The AAE Copyediting Roadmap VIII). The Reference Number Order Check macro can provide my client with a report showing which references were deleted as duplicates and what those references were renumbered as. A sample report is shown here:

Sample Report of Duplicate References

As the sample report shows, references 78 (#5) and 201 (#6) were deleted and all callouts numbered 78 were renumbered as 19 and all callouts numbered 201 were renumbered as 85.

Find Duplicate References works very well. In one chapter I edited in a recent project, the macro found 23 duplicate references in a 700-entry reference list (one reference was duplicated five times!). It took the macro seconds to find those duplicates; had I looked for them without using FDR, it would have added hours to the project and turned the project into a profit-loser.

Richard Adin, An American Editor

November 21, 2016

EditTools: Duplicate References — A Preview

The current version of EditTools is nearly 1 year old. Over the past months, a lot of work has gone into improvements to existing functions and in creating new functions. Shortly, a new version of EditTools will be released (it will be a free upgrade for registered users).

New in the forthcoming version is the Find Duplicate References macro, which is listed as Duplicate Refs on the References menu as shown here:

Duplicate Refs on the References Menu

Duplicate Refs on the References Menu

The preliminaries

The macro works with both unnumbered and numbered reference lists (works better when the numbers are not autonumbers, but it does work with autonumbered lists). It also works with the reference list left in the manuscript with the text paragraphs and when the reference list has been moved temporarily to its own file (it works, like other reference-specific macros in EditTools, better when the references are moved to a separate, references-only file).

Like all macros, the Find Duplicate References macro is “dumb”; that is, it only finds identical references. The following image shows references 19 and 78 as submitted for editing. (For all images in this essay: For a larger, more readable image, right-click on the image and click “Open link in new tab.” This will open a larger version of the image in a new tab that can be kept open as you read the description of the image.)

Original References

Original References

As the image shows, although references 19 and 78 are identical references and are likely to appear identical to an editor, they will not appear identical to the Find Duplicate References macro. Items 1 and 2 show a slight difference in the author name (19: “Infant”, 78: “Infantile”). The journal names are different in that in 19 the abbreviated name is used (#3) whereas in 78 the name is spelled out (#4). Finally, as #5 and #6 show, there are a couple of differences in the cite information, namely, the order, the use of a hyphen or en-dash to indicate range, and the final page number.

Because any one of these differences would prevent the macro from pairing these references and marking them as potentially identical, it is important that the references go through a round of editing first. After editing, which for EditTools users should also include running the Journals macro, the references are likely to look like this:

The References After Editing

The References After Editing

If you compare the same items (1 and 2, 3 and 4, 5 and 6) in the above image, you will see that they now better match. (Ignore the inserted comments for now; they are discussed below.) One more step is required before the Find Duplicate References macro can be run — you need to accept all of the changes that were made. Remember that in Word, when changes are made with Tracking on, the material marked as deleted is not yet deleted; consequently, when the macro is run, the Tracked items will interfere (as will any comments, which also need to be deleted). The best method is to (1) save the tracked version, (2) accept all the changes, (3) use EditTools’ Comment Editor to delete any comments, and (4) save this clean version to run the Find Duplicate References macro.

After accepting all changes and deleting the comments, the entries for references 19 and 78 look like this:

The References After Changes Accepted

The References After Changes Accepted

Running the macro

When the Find Duplicate References macro is run, the following message box appears.

Find Duplicate References Message Box

Find Duplicate References Message Box

To run the macro, the macro has to be told where to begin and end its search. If the references are in a separate file from the rest of the manuscript, check the box indicating that the references are in a standalone document (#5) and click Run (#6). If the references are in a file with other material, use bookmarks to mark the beginning and ending of the list as instructed at the top of the message box (#1). To make it easier, the Bookmarks macro now has buttons to insert these bookmarks:

The dupBegin and dupEnd Bookmark Insert Buttons

The dupBegin and dupEnd Bookmark Insert Buttons

The Find Duplicate References macro matches a set number of characters, including spaces. The default is 120 (#4) but you can change the number to 36, 48, 60, 72, 84, 96, or 108 using the dropdown arrow shown at #4 in the Find Duplicate References message box above.

The macro does a two-pass search, one from the beginning of the reference and another from the end of the reference, which is why a list of duplicates may have repetitions.

The results of the search appear like this:

List of Possible Duplicate References

List of Possible Duplicate References

(They appear as tracked changes only if the macro is run with Tracking on; if Tracking is off, the results appear as normal text.) Note the title of the duplicates is “Duplicate Entries (Nondefinitive).” The reason for “Nondefinitive” is to remind you that the macro is “dumb” and there is no guarantee that the list includes all duplicates or that all listed items are duplicated. Much of the macro’s accuracy depends on the consistency of editing, including formatting.

For the examples in this essay, the Find Duplicate References macro was run on a list of 735 references and the list of possibilities shown represents those likely duplicate references the macro found. Note that references 19 and 78 were found (#19 and #78 indicate the portions of those references found duplicated by each pass of the macro); however, if, for example, in editing the page range separator in #19 was left as an en-dash in reference 19 and in reference 78 as a hyphen, the macro would not have listed the material at #19 as there would not have been a match. Similarly, if the author name in reference 19 had been left as “Infant” and in reference 78 as “Infantile”, the macro would not have listed the material at #78 as there would not have been a match.

The next step is for the editor to determine which of the listed possibilities are duplicates. This is done using Word’s Find Navigation pane, as shown here:

Verifying Duplicate References

Verifying Duplicate References

Copy part or all of what was found (#1) into the Find field (#2). Find will display the search results (“3 matches”) (#3); clicking the Browse button (the rightmost button at #3) lists the three matches found (#4 to #6). The first entry (#4) is always the text in the duplicates list (#1), which means that, in this example, the possible duplicates are #5 and #6. Clicking on the text marked #5 to see the complete text of that entry. Then compare that text to the text of the reference at #6. (It is possible for the macro to find more than two possible matches for the same text — and all, some, or none may be duplicates.)

Tip: Use comments to track duplicates

When I find a duplicate, I insert a prewritten, standardized comment (using EditTools’ Insert Query) to tell the client that references x and y are duplicates and that I am deleting one and renumbering it (see image below for a sample comment). I insert the comment at each of the duplicate references, although I slightly modify the comment so that it is appropriate for the reference to which it is being attached. The comment shown below is inserted at reference 78 and its language is appropriate for that reference. It tells the client that references 19 and 78 are identical and that reference 78 has been deleted and renumbered as 19. This type of comment is added to the version (e.g., the Track Changes version) of the reference list that will be given the client. The comment is added to the appropriate references as duplication is confirmed.

The Inserted Comment

The Inserted Comment

The comment, in addition to serving as a message to the client, serves as a reminder message during editing of the manuscript. Duplicate references require renumbering so as to keep reference callouts in number order. For example, it may be that reference 78 is called out after the callout for reference 10 and before that for 19. In that case, reference 78 would be moved to position 11 in the list and renumbered as 11 and the comment would be modified (easy to do using EditTools’ Comment Editor). A prewritten note (another new EditTools feature) would be inserted at point 78 in EditTools’ Reference Number Order Check and reference 19 would be marked as deleted, the inserted comment (see above) would be modified, and a note would be added to Reference Number Order Check at point 19. (See the discussion below about the report.)

When editing of the manuscript is finished, have the Reference Number Order Check macro export a renumbering report to send with the edited file to the client. A partial sample report is shown here:

Sample Partial Renumbering Report

Sample Partial Renumbering Report

Every report bears the creator’s identification information (#1) and file title (#2). You set the creator information once and it remains the same for every report until you change it using a manager. The file title is set each time you create a report.

As the report shows, reference 78 was deleted and all callouts numbered 78 were renumbered as 19 (#3). The prewritten, standard message (a new feature) can be inserted with a mouse click; only the numbers need to be inserted or modified. The report shows that the renumbering stopped at callout 176 (#4) and started again at 197 (#5). Number 6 shows another deletion and renumbering.

Clients like these reports because it makes it easy for authors, proofreaders, and others involved in the production process to track what was done.

The Find Duplicate References macro is a handy addition to EditTools. While it is easy in very short reference lists to check for duplicate references, as the number of references grows, checking for duplicates becomes increasingly difficult and time-consuming. The Find Duplicate References macro saves a lot of time, thereby increasing an editor’s profits.

Richard Adin, An American Editor

Blog at

%d bloggers like this: