pos2tracery

Convert corpus to tracery grammar with a POS tagger

View project on GitHub

pos2tracery

npm version

pos2tracery is a tool stemming from my NaNoGenMo 2017 Artbook Project and modified with behaviors from the NaNoGenMo 2018 Wink Project it uses the Wink POS Tagger which uses the transformation based learning (TBL) approach to create tracery grammars to take sentence forms from a corpus, but replace the parts of speech within them with other parts of speech from throughout the corpus. It is aware of contractions and punction; and creates tracery’s default english modifiers.

INSTALL

npm install -g pos2tracery

SYNOPSIS

pos2tracery currently consists of 3 tools, pos2tracery, merge, and generate each can be run as a standalone app, or imported into your projects.

CLI USAGE

POS

Generate tracery grammars from POS tags.

pos2tracery [pos|p] <input> [output] [options]
example: pos2tracery pos corpus.txt grammar.json

Positionals:
  input   input/source file  [string] [required]
  output  optional output/destination file, if not set file prints to stdout  [string]

Options:
  --version        Show version number  [boolean]
  --verbose, -v    print details while processing  [count]
  --percent, -p    limit the percentage of words replaced with their POS tags number between 1 and 100  [number] [default: 100]
  --modifiers, -m  replace english modifiers with their equivalent tracery.modifier function  [boolean] [default: false]
  --origin, -o     Include "origin" key in tracery file, specify --no-origin to not add this key  [boolean] [default: true]
  --ignore, -i     list of parts of speech to not tagify  [array] [default: []]
  --split, -s      determine string splitting strategy: line, paragraph, or sentence  [choices: "l", "p", "s"] [default: "s"]
  -h, --help       Show help  [boolean]

Soundex

Generate tracery grammars with Soundex.

pos2tracery soundex <input> [output]
example: pos2tracery soundex corpus.txt grammar.json

Positionals:
  input   input/source file  [string] [required]
  output  optional output/destination file, if not set file prints to stdout  [string]

Options:
  --version      Show version number  [boolean]
  --verbose, -v  print details while processing  [count]
  --percent, -p  limit the percentage of words replaced with their POS tags number between 1 and 100  [number] [default: 100]
  --origin       Include "origin" key in tracery file, specify --no-origin to not add this key  [boolean] [default: true]
  --split, -s    determine string splitting strategy: line, paragraph, or sentence  [choices: "l", "p", "s"] [default: "s"]
  -h, --help     Show help  [boolean]

Merge

Merge 2 tracery grammars with

pos2tracery merge <inputA> <inputB> [output]
example: pos2tracery merge grammar.json grammar2.json combined_output.json

Positionals:
  inputA  input/source file  [string] [required]
  inputB  input/source file  [string] [required]
  output  optional output/destination file, if not set file prints to stdout  [string]

Options:
  --version      Show version number  [boolean]
  -v, --verbose  print details while processing  [boolean] [default: false]
  -d, --dupes  [boolean] [default: true]
  -h, --help     Show help  [boolean]

Generate

Generate text from a tracery grammar

pos2tracery generate <input>
pos2tracery generate grammar.json

Positionals:
  input  input/source grammar file  [string] [required]

Options:
  --version        Show version number  [boolean]
  -m, --modifiers  use modifiers  [boolean] [default: true]
  -o, --origin     use specified origin to create sentences  [string] [default: "origin"]
  --repeat, -r     define number of sentence to generate  [number] [default: 1]
  --evaluate, -e   evaluate tracery as javascript template (write javascript inside ${} in tracery)  [boolean] [default: false]
  --verbose, -v    output information about internal processes  [count]
  -h, --help       Show help  [boolean]

Delete

Delete keys in a tracery grammar through a whitelist and/or a blacklist

pos2tracery delete <input> [output]
pos2tracery delete grammar.json grammar-clean.json -t story

Positionals:
  input   input/source file  [string] [required]
  output  optional output/destination file, if not set file prints to stdout  [string]

Options:
  --version      Show version number  [boolean]
  --keep, -k     a list of keys to keep from the input json file (overrides duplicate values in toss)  [array] [default: []]
  --toss, -t     a list of keys to delete from the input json file  [array] [default: []]
  -v, --verbose  print details while processing  [count]
  -h, --help     Show help  [boolean]

MODULE USAGE

pos2tracery can also be used inside of your node projects. Each option is set by using the long-form version of the CLI option The only change is that since delete is a reserved word in Javascript the function is called del

const p2t = require('pos2tracery');

let merged = p2t.merge({
  inputA: p2t.del({
            input: "./grammar_A.json",
            toss: "story"
          }),
  inputB: p2t.del({
            input: "./grammar_B.json",
            keep: "story"
          })
  });

  p2t.generate({
    input: merged,
    modifiers: true
  });