<refentry xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:src="http://nwalsh.com/xmlns/litprog/fragment" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="5.0" xml:id="make.index.markup"> <refmeta> <refentrytitle>make.index.markup</refentrytitle> <refmiscinfo class="other" otherclass="datatype">boolean</refmiscinfo> </refmeta> <refnamediv> <refname>make.index.markup</refname> <refpurpose>Generate XML index markup in the index?</refpurpose> </refnamediv> <refsynopsisdiv> <src:fragment xml:id="make.index.markup.frag"> <xsl:param name="make.index.markup" select="0"/> </src:fragment> </refsynopsisdiv> <refsection><info><title>Description</title></info> <para>This parameter enables a very neat trick for getting properly merged, collated back-of-the-book indexes. G. Ken Holman suggested this trick at Extreme Markup Languages 2002 and I'm indebted to him for it.</para> <para>Jeni Tennison's excellent code in <filename>autoidx.xsl</filename> does a great job of merging and sorting <tag>indexterm</tag>s in the document and building a back-of-the-book index. However, there's one thing that it cannot reasonably be expected to do: merge page numbers into ranges. (I would not have thought that it could collate and suppress duplicate page numbers, but in fact it appears to manage that task somehow.)</para> <para>Ken's trick is to produce a document in which the index at the back of the book is <quote>displayed</quote> in XML. Because the index is generated by the FO processor, all of the page numbers have been resolved. It's a bit hard to explain, but what it boils down to is that instead of having an index at the back of the book that looks like this:</para> <blockquote> <formalpara><info><title>A</title></info> <para>ap1, 1, 2, 3</para> </formalpara> </blockquote> <para>you get one that looks like this:</para> <blockquote> <programlisting><indexdiv>A</indexdiv> <indexentry> <primaryie>ap1</primaryie>, <phrase role="pageno">1</phrase>, <phrase role="pageno">2</phrase>, <phrase role="pageno">3</phrase> </indexentry></programlisting> </blockquote> <para>After building a PDF file with this sort of odd-looking index, you can extract the text from the PDF file and the result is a proper index expressed in XML.</para> <para>Now you have data that's amenable to processing and a simple Perl script (such as <filename>fo/pdf2index</filename>) can merge page ranges and generate a proper index.</para> <para>Finally, reformat your original document using this literal index instead of an automatically generated one and <quote>bingo</quote>!</para> </refsection> </refentry>