summaryrefslogtreecommitdiffstats
blob: c4ba872914e86aa59fb09ac8a1c915fa803627f1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:exsl="http://exslt.org/common" xmlns:ng="http://docbook.org/docbook-ng" xmlns:db="http://docbook.org/ns/docbook"><head>
<meta http-equiv="X-UA-Compatible" content="IE=7" />
<title>Search</title><meta name="generator" content="DocBook XSL Stylesheets V1.76.1" /><link rel="home" href="index.html" title="README: Web-based Help from DocBook XML" /><link rel="up" href="ch03.html" title="Chapter 3. Developer Docs" /><link rel="prev" href="ch03s01.html" title="Design" /><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><script type="text/javascript">
            //The id for tree cookie
            var treeCookieId = "treeview-897";
            var language = "en";
            var w = new Object();
            //Localization
            txt_filesfound = 'Results';
            txt_enter_at_least_1_char = "You must enter at least one character.";
            txt_browser_not_supported = "Your browser is not supported. Use of Mozilla Firefox is recommended.";
            txt_please_wait = "Please wait. Search in progress...";
            txt_results_for = "Results for: ";
        </script><style type="text/css">
            input {
            margin-bottom: 5px;
            margin-top: 2px;
            }

            .folder {
            display: block;
            height: 22px;
            padding-left: 20px;
            background: transparent url(../common/jquery/treeview/images/folder.gif) 0 0px no-repeat;
            }
            </style><link rel="shortcut icon" href="../favicon.ico" type="image/x-icon" /><link rel="stylesheet" type="text/css" href="../common/css/positioning.css" /><link rel="stylesheet" type="text/css" href="../common/jquery/theme-redmond/jquery-ui-1.8.2.custom.css" /><link rel="stylesheet" type="text/css" href="../common/jquery/treeview/jquery.treeview.css" /><script type="text/javascript" src="../common/jquery/jquery-1.4.2.min.js"></script><script type="text/javascript" src="../common/jquery/jquery-ui-1.8.2.custom.min.js"></script><script type="text/javascript" src="../common/jquery/jquery.cookie.js"></script><script type="text/javascript" src="../common/jquery/treeview/jquery.treeview.min.js"></script><script type="text/javascript" src="search/htmlFileList.js"></script><script type="text/javascript" src="search/htmlFileInfoList.js"></script><script type="text/javascript" src="search/nwSearchFnt.js"></script><script type="text/javascript" src="search/stemmers/en_stemmer.js"><!--//make this scalable to other languages as well.--></script><script type="text/javascript" src="search/index-1.js"></script><script type="text/javascript" src="search/index-2.js"></script><script type="text/javascript" src="search/index-3.js"></script></head><body><div id="header"><img style="margin-right: 2px; height: 59px; padding-right: 25px; padding-top: 8px" align="right" src="../common/images/logo.png" alt="Company Logo" /><h1 align="center">Search<br />Chapter 3. Developer Docs</h1><div id="navheader" align="right"><table><tr><td style="height: 28px; width: 16px;"><a id="showHideButton" onclick="showHideToc();" class="pointLeft" title="Hide TOC tree">.
                            </a></td><td><img src="../common/images/highlight-blue.gif" alt="H" height="25px" onclick="toggleHighlight()" id="showHideHighlight" style="cursor:pointer" title="Toggle search result highlighting" /></td><td><a accesskey="p" href="ch03s01.html">Prev</a>
                                        |
                                        <a accesskey="u" href="ch03.html">Up</a></td></tr></table></div></div><div id="content"><div class="section" title="Search"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a id="id36124495"></a>Search</h2></div></div></div><p class="summary">Overview design of Search mechanism.</p><p>
        The searching is a fully client-side implementation of querying texts for
        content searching, and no server is involved. That means when a user enters a query, 
        it is processed by JavaScript inside the browser, and displays the matching results by
        comparing the query with a generated 'index', which too reside in the client-side web browser.
        
        Mainly the search mechanism has two parts.
        </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>Indexing: First we need to traverse the content in the docs/content folder and index 
              the words in it. This is done by <code class="filename">nw-cms.jar</code>. You can invoke it by  
            <code class="code">ant index</code> command from the root of webhelp of directory. You can recompile it 
              again and build the jar file by <code class="code">ant build-indexer</code>. Indexer has some extensive 
              support for such as stemming of words. Indexer has extensive support for English, German,
              French languages. By extensive support, what I meant is that those texts are stemmed
              first, to get the root word and then indexes them. For CJK (Chinese, Japanese, Korean)
              languages, it uses bi-gram tokenizing to break up the words. (CJK languages does not have 
              spaces between words.)                
            </p><p>
              When we run <code class="code">ant index</code>, it generates five output files:
                </p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p><code class="filename">htmlFileList.js</code> - This contains an array named <code class="code">fl</code> which stores details 
                    all the files indexed by the indexer.  
                    </p></li><li class="listitem"><p><code class="filename">htmlFileInfoList.js</code> - This includes some meta data about the indexed files in an array 
                      named <code class="code">fil</code>. It includes details about file name, file (html) title, a summary 
                      of the content.Format would look like, 
      <code class="code">fil["4"]= "ch03.html@@@Developer Docs@@@This chapter provides an overview of how webhelp is implemented.";</code> 
                    </p></li><li class="listitem"><p><code class="filename">index-*.js</code> (Three index files) - These three files actually stores the index of the content. 
                      Index is added to an array named <code class="code">w</code>.</p></li></ul></div><p>
              
            </p></li><li class="listitem"><p>
              Querying: Query processing happens totally in client side. Following JavaScript files handles them.
              </p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p><code class="filename">nwSearchFnt.js</code> - This handles the user query and returns the search results. It does query 
                    word tokenizing, drop unnecessary punctuations and common words, do stemming if docbook language 
                    supports it, etc.</p></li><li class="listitem"><p><code class="filename">{$indexer-language-code}_stemmer.js</code> - This includes the stemming library. 
                    <code class="filename">nwSearchFnt.js</code> file calls <code class="code">stemmer</code> method in this file for stemming.
                    ex: <code class="code">var stem = stemmer(foobar);</code>                    
                  </p></li></ul></div><p>
            </p></li></ul></div><p>
      </p><div class="section" title="New Stemmers"><div class="titlepage"><div><div><h3 class="title"><a id="id36124646"></a>New Stemmers</h3></div></div></div><p class="summary">Adding new Stemmers is very simple.</p><p>Currently, only English, French, and German stemmers are integrated in to WebHelp. But the code is
        extensible such that you can add new stemmers easily by few steps.</p><p>What you need:
        </p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>You'll need two versions of the stemmer; One written in JavaScript, and another in Java. But fortunately, 
              Snowball contains Java stemmers for number of popular languages, and are already included with the package. 
              You can see the full list in <a class="ulink" href="ch02s04.html" target="_top">Adding support for other (non-CJKV) languages</a>. 
              If your language is listed there,
              Then you have to find javascript version of the stemmer. Generally, new stemmers are getting added in to
              <a class="ulink" href="http://snowball.tartarus.org/otherlangs/index.html" target="_top">Snowball Stemmers in other languages</a> location.
              If javascript stemmer for your language is available, then download it. Else, you can write a new stemmer in 
              JavaScript using SnowBall algorithm fairly easily. Algorithms are at  
              <a class="ulink" href="http://snowball.tartarus.org/" target="_top">Snowball</a>. 
            </p></li><li class="listitem"><p>Then, name the JS stemmer exactly like this: <code class="filename">{$language-code}_stemmer.js</code>. For example, 
              for Italian(it), name it as, <code class="filename">it_stemmer.js</code>. Then, copy it to the 
              <code class="filename">docbook-webhelp/template/content/search/stemmers/</code> folder. (I assumed 
              <code class="filename">docbook-webhelp</code> is the root folder for webhelp.)
              </p><div class="note" title="Note" style="margin-left: 0.5in; margin-right: 0.5in;"><h3 class="title">Note</h3><p>Make sure you changed the <code class="code">webhelp.indexer.language</code> property in <code class="filename">build.properties</code>
                to your language.
                </p></div><p>

            </p></li><li class="listitem"><p>Now two easy changes needed for the indexer.</p><div class="itemizedlist"><ul class="itemizedlist" type="circle"><li class="listitem"><p>Open <code class="filename">docbook-webhelp/indexer/src/com/nexwave/nquindexer/IndexerTask.java</code> in 
                  a text editor and add your language code to the <code class="code">supportedLanguages</code> String Array. </p><div class="example"><a id="id36124759"></a><p class="title"><strong>Example 3.1. Add new language to supportedLanguages array</strong></p><div class="example-contents"><p>
                    change the Array from,
</p><pre class="programlisting">
private String[] supportedLanguages= {"en", "de", "fr", "cn", "ja", "ko"}; 
    //currently extended support available for
    // English, German, French and CJK (Chinese, Japanese, Korean) languages only.
</pre><p>
                    To,</p><pre class="programlisting">
private String[] supportedLanguages= {"en", "de", "fr", "cn", "ja", "ko", <span class="emphasis"><em>"it"</em></span>}; 
  //currently extended support available for
  // English, German, French, CJK (Chinese, Japanese, Korean), and Italian languages only.
                    </pre></div></div><br class="example-break" /></li><li class="listitem"><p>
                  Now, open <code class="filename">docbook-webhelp/indexer/src/com/nexwave/nquindexer/SaxHTMLIndex.java</code> and 
                  add the following line to the code where it initializes the Stemmer (Search for 
                  <code class="code">SnowballStemmer stemmer;</code>). Then add code to initialize the stemmer Object in your language. 
                  It's self understandable. See the example. The class names are at: 
                  <code class="filename">docbook-webhelp/indexer/src/com/nexwave/stemmer/snowball/ext/</code>.
                </p><div class="example"><a id="id36124809"></a><p class="title"><strong>Example 3.2. initialize correct stemmer based on the <code class="code">webhelp.indexer.language</code> specified</strong></p><div class="example-contents"><pre class="programlisting">
      SnowballStemmer stemmer;
      if(indexerLanguage.equalsIgnoreCase("en")){
           stemmer = new EnglishStemmer();
      } else if (indexerLanguage.equalsIgnoreCase("de")){
          stemmer= new GermanStemmer();
      } else if (indexerLanguage.equalsIgnoreCase("fr")){
          stemmer= new FrenchStemmer();
      }
<span class="emphasis"><em>else if (indexerLanguage.equalsIgnoreCase("it")){ //If language code is "it" (Italian)
          stemmer= new italianStemmer();  //Initialize the stemmer to <code class="code">italianStemmer</code> object.
      } </em></span>      
      else {
          stemmer = null;
      }
</pre></div></div><br class="example-break" /></li></ul></div></li></ul></div><p>
        </p><p>That's all. Now run <code class="code">ant build-indexer</code> to compile and build the java code. 
          Then, run <code class="code">ant webhelp</code> to generate the output from your docbook file. 
          For any questions, contact us or email to the docbook mailing list 
          <code class="email">&lt;<a class="email" href="mailto:docbook-apps@lists.oasis-open.org">docbook-apps@lists.oasis-open.org</a>&gt;</code>.
        </p></div></div><script type="text/javascript" src="../common/main.js"></script><div class="navfooter"><hr /><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="ch03s01.html">Prev</a> </td><td width="20%" align="center"><a accesskey="u" href="ch03.html">Up</a></td><td width="40%" align="right"> </td></tr><tr><td width="40%" align="left" valign="top"> </td><td width="20%" align="center"><a accesskey="h" href="index.html">Home</a></td><td width="40%" align="right" valign="top"> </td></tr></table></div></div><div><div id="leftnavigation" style="padding-top:3px; background-color:white;"><div id="tabs"><ul><li><a href="#treeDiv"><em>Contents</em></a></li><li><a href="#searchDiv"><em>Search</em></a></li></ul><div id="treeDiv"><img src="../common/images/loading.gif" alt="loading table of contents..." id="tocLoading" style="display:block;" /><div id="ulTreeDiv" style="display:none"><ul id="tree" class="filetree"><li><span class="file"><a href="ch01.html">Introduction</a></span></li><li><span class="file"><a href="ch02.html">Using the package</a></span><ul><li><span class="file"><a href="ch02s01.html">Generating webhelp output</a></span></li><li><span class="file"><a href="ch02s02.html">Using and customizing the output</a></span><ul><li><span class="file"><a href="ch02s02.html#id36124136">Recommended Apache configurations</a></span></li></ul></li><li><span class="file"><a href="ch02s03.html">Building the indexer</a></span></li><li><span class="file"><a href="ch02s04.html">Adding support for other (non-CJKV) languages</a></span></li></ul></li><li><span class="file"><a href="ch03.html">Developer Docs</a></span><ul><li><span class="file"><a href="ch03s01.html">Design</a></span></li><li id="webhelp-currentid"><span class="file"><a href="ch03s02.html">Search</a></span><ul><li><span class="file"><a href="ch03s02.html#id36124646">New Stemmers</a></span></li></ul></li></ul></li></ul></div></div><div id="searchDiv"><div id="search"><form onsubmit="Verifie(ditaSearch_Form);return false" name="ditaSearch_Form" class="searchForm" id="ditaSearch_Form"><fieldset class="searchFieldSet"><legend>Search</legend><center><input id="textToSearch" name="textToSearch" type="text" class="searchText" /> &nbsp; <input onclick="Verifie(ditaSearch_Form)" type="button" class="searchButton" value="Go" id="doSearch" /></center></fieldset></form></div><div id="searchResults"><center></center></div></div></div></div></div></body></html>