Class TextIndexer.DefaultTextMapper

  • All Implemented Interfaces:
    TextIndexer.TextMapper
    Enclosing class:
    TextIndexer

    public static class TextIndexer.DefaultTextMapper
    extends java.lang.Object
    implements TextIndexer.TextMapper
    A default text mapper implementation that assumes that the source IDs represent URLs. The returned indexed IDs are identical to the source IDs.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String getIndexID​(java.lang.String sourceID)
      Maps a source identifier to an index identifier.
      java.lang.String getText​(java.lang.String sourceID)
      Given a source ID, return the text associated with that ID.
      protected java.lang.String preprocess​(java.lang.String sourceID, java.net.URL url, java.lang.String text)
      Preprocesses the text after it is read but before it is returned to the caller of getText(java.lang.String).
      protected java.lang.String read​(java.lang.String sourceID, java.net.URL url, java.lang.String encodingHint)
      Reads the source document from the URL and returns it as a string of indexable words.
      protected java.net.URL toURL​(java.lang.String sourceID)
      Return a URL for the source ID.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • DefaultTextMapper

        public DefaultTextMapper()
    • Method Detail

      • getIndexID

        public java.lang.String getIndexID​(java.lang.String sourceID)
        Description copied from interface: TextIndexer.TextMapper
        Maps a source identifier to an index identifier. If the source ID should be identified differently in the index, this returns the version to include in the index.
        Specified by:
        getIndexID in interface TextIndexer.TextMapper
        Parameters:
        sourceID - the ID used to locate the text during indexing
        Returns:
        the ID used to locate the text when using the index
      • getText

        public java.lang.String getText​(java.lang.String sourceID)
                                 throws java.io.IOException
        Given a source ID, return the text associated with that ID. The default mapper does this by calling toURL(java.lang.String) on the source ID, reading and then preprocessing the result.
        Specified by:
        getText in interface TextIndexer.TextMapper
        Parameters:
        sourceID - an identifier that the mapper uses to locate the text
        Returns:
        the text mapped to by the ID
        Throws:
        java.io.IOException - if an I/O error occurs while fetching the document
      • toURL

        protected java.net.URL toURL​(java.lang.String sourceID)
                              throws java.io.IOException
        Return a URL for the source ID. The default implementation simply returns a new URL using the source ID as if by new URL(sourceID).
        Parameters:
        sourceID - returns a URL for the source ID
        Returns:
        a URL to use to read the source text
        Throws:
        java.io.IOException - if an error occurs while creating the URL
      • read

        protected java.lang.String read​(java.lang.String sourceID,
                                        java.net.URL url,
                                        java.lang.String encodingHint)
                                 throws java.io.IOException
        Reads the source document from the URL and returns it as a string of indexable words.
        Parameters:
        sourceID - the identifier of the document
        url - the URL to read the document from
        encodingHint - the name of an encoding, or null to use a default encoding
        Returns:
        the document text
        Throws:
        java.io.IOException - if an error occurs while reading the document
      • preprocess

        protected java.lang.String preprocess​(java.lang.String sourceID,
                                              java.net.URL url,
                                              java.lang.String text)
        Preprocesses the text after it is read but before it is returned to the caller of getText(java.lang.String). The default implementation returns the text unchanged.
        Parameters:
        sourceID - the identifier of the document
        url - the URL that the document was read from
        text - the original text
        Returns:
        the modified text