summaryrefslogtreecommitdiffstats
blob: 7942b5a50c0688ea9f6f630b80eeac5e28ef3331 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
<refentry xmlns="http://docbook.org/ns/docbook"
          xmlns:xlink="http://www.w3.org/1999/xlink"
          xmlns:xi="http://www.w3.org/2001/XInclude"
          xmlns:src="http://nwalsh.com/xmlns/litprog/fragment"
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
          version="5.0" xml:id="make.index.markup">
<refmeta>
<refentrytitle>make.index.markup</refentrytitle>
<refmiscinfo class="other" otherclass="datatype">boolean</refmiscinfo>
</refmeta>
<refnamediv>
<refname>make.index.markup</refname>
<refpurpose>Generate XML index markup in the index?</refpurpose>
</refnamediv>

<refsynopsisdiv>
<src:fragment xml:id="make.index.markup.frag">
<xsl:param name="make.index.markup" select="0"/>
</src:fragment>
</refsynopsisdiv>

<refsection><info><title>Description</title></info>

<para>This parameter enables a very neat trick for getting properly
merged, collated back-of-the-book indexes. G. Ken Holman suggested
this trick at Extreme Markup Languages 2002 and I'm indebted to him
for it.</para>

<para>Jeni Tennison's excellent code in
<filename>autoidx.xsl</filename> does a great job of merging and
sorting <tag>indexterm</tag>s in the document and building a
back-of-the-book index. However, there's one thing that it cannot
reasonably be expected to do: merge page numbers into ranges. (I would
not have thought that it could collate and suppress duplicate page
numbers, but in fact it appears to manage that task somehow.)</para>

<para>Ken's trick is to produce a document in which the index at the
back of the book is <quote>displayed</quote> in XML. Because the index
is generated by the FO processor, all of the page numbers have been resolved.
It's a bit hard to explain, but what it boils down to is that instead of having
an index at the back of the book that looks like this:</para>

<blockquote>
<formalpara><info><title>A</title></info>
<para>ap1, 1, 2, 3</para>
</formalpara>
</blockquote>

<para>you get one that looks like this:</para>

<blockquote>
<programlisting>&lt;indexdiv&gt;A&lt;/indexdiv&gt;
&lt;indexentry&gt;
&lt;primaryie&gt;ap1&lt;/primaryie&gt;,
&lt;phrase role="pageno"&gt;1&lt;/phrase&gt;,
&lt;phrase role="pageno"&gt;2&lt;/phrase&gt;,
&lt;phrase role="pageno"&gt;3&lt;/phrase&gt;
&lt;/indexentry&gt;</programlisting>
</blockquote>

<para>After building a PDF file with this sort of odd-looking index, you can
extract the text from the PDF file and the result is a proper index expressed in
XML.</para>

<para>Now you have data that's amenable to processing and a simple Perl script
(such as <filename>fo/pdf2index</filename>) can
merge page ranges and generate a proper index.</para>

<para>Finally, reformat your original document using this literal index instead of
an automatically generated one and <quote>bingo</quote>!</para>

</refsection>
</refentry>