Class MarkovText


  • public class MarkovText
    extends java.lang.Object
    Generates random text using a Markov model. The generator must be supplied with a sample text model on which to base the produced text. The text must consist of at least one word, and may consist of any number of words. A word is any uninterrupted sequence of non-whitespace characters. Whitespace characters have special meaning to the generator; any sequence of whitespace is treated as if it were a single plain space.

    The generator can produce either letter sequences or word sequences. When generating a letter sequence, a specific number of letters is requested by the caller; for word sequences, a specific number of words is requested. When producing word sequences, the system can either choose whole words (in which case every word generated will actually occur in the model text), or it can create a sequence of pseudo-words, which are generated one letter at a time and may include "words" that do not appear in the model text.

    Markov modelling produces more realistic text than simply selecting items (letters or words) at random from the model text. Markov models have an order, and the next item that is chosen depends on the previous order items. When order is 0, no previous items are taken into account: the next item depends only on the frequency of items in the model. If A appears twice as often as B, then A is twice as likely to be chosen. If order is 1, then 1 previous item is considered. If the previous item is C, and A follows C three times as often as B follows C, then A is three times more likely to be chosen after a C is chosen.

    High orders become increasingly likely to simply reproduce long passages from the source text, because the number of possible choices drops rapidly as the order increases. That is, there will often be only one item that follows the previous order items, so that item will be the only choice that can be made.

    Author:
    Chris Jennings
    • Constructor Summary

      Constructors 
      Constructor Description
      MarkovText()  
      MarkovText​(java.lang.CharSequence text)  
      MarkovText​(java.lang.CharSequence text, java.util.Random rand)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      java.lang.String generateCharacters​(int charCount)
      Generate charCount letters of text.
      java.lang.String generatePseudowords​(int wordCount)
      Generate wordCount words of text, one letter at a time.
      java.lang.String generateWords​(int wordCount)
      Generate wordCount words of text, one word at a time.
      int getOrder()
      Returns the currently set Markov order.
      void setOrder​(int order)
      Set the Markov order to use for text generation.
      void setText​(java.lang.CharSequence text)
      Set the text used as a model for generating new text.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • MarkovText

        public MarkovText()
      • MarkovText

        public MarkovText​(java.lang.CharSequence text)
      • MarkovText

        public MarkovText​(java.lang.CharSequence text,
                          java.util.Random rand)
    • Method Detail

      • setText

        public void setText​(java.lang.CharSequence text)
        Set the text used as a model for generating new text. The text will be broken into words, where a word is any non-whitespace sequence. An exception is thrown if the text does not contain at least one word.
        Parameters:
        text - the model text to base generated text upon
        Throws:
        java.lang.IllegalArgumentException - if the text does not contain any words
      • getOrder

        public int getOrder()
        Returns the currently set Markov order. The Markov order determines how many previous words or letters are used for context when choosing the next word or letter.
        Returns:
        the current Markov order
      • setOrder

        public void setOrder​(int order)
        Set the Markov order to use for text generation. The Markov order determines how many previous words or letters are used for context when choosing the next word or letter.
        Parameters:
        order - the Markov order to use when generating text
        Throws:
        java.lang.IllegalArgumentException - if order < 0
      • generateCharacters

        public java.lang.String generateCharacters​(int charCount)
        Generate charCount letters of text.
        Parameters:
        charCount - the number of letters to generate.
        Returns:
        the generated text
        Throws:
        java.lang.IllegalArgumentException - is n < 0
      • generatePseudowords

        public java.lang.String generatePseudowords​(int wordCount)
        Generate wordCount words of text, one letter at a time.
        Parameters:
        wordCount - the number of words to generate.
        Returns:
        the generated text
        Throws:
        java.lang.IllegalArgumentException - is words < 0
      • generateWords

        public java.lang.String generateWords​(int wordCount)
        Generate wordCount words of text, one word at a time.
        Parameters:
        wordCount - the number of words to generate.
        Returns:
        the generated text
        Throws:
        java.lang.IllegalArgumentException - is words < 0