CSE 788R04: Knowledge bootstrapping for Language Processing Applications

Software tools

Name DescriptionUsed in which paper?
SVM Light Support Vector Machine toolkit[1]
Semantic Web opensource tools Tools for using semantic web markupfound on web
Cluto clustering tool makes clustersfound on web
Weka machine learning package Classification toolkit with decision trees, bayesian, EM, clusteringmany
Google apis pull web pages and page counts into a program using Google search[2]
The Linguist's Search Engine Returns web pages with sentences formed as you specify, using words or phrase-tagsThe economist article, linked below
Clustering and similarity tools from Ted Pederson a variety of useful tools in PerlAAAI
TNT parser A parser
Mallet Classification toolkit
Lingpipe Named Entity extraction and NP chunker
LT Chunk NP chunker

Data Resources

Name DescriptionUsed in which paper?
Wordnet Hand-constructed ThesaurusMany
Framenet Hand-constructed ThesaurusMany
DAML Ontologies Over 200 XML-format hand-constructed Ontologiesfound on the web
CYC Hand-built KB of common sensecommon knowledge
Verbnet Hand-built KB of verbs and argument restrictionscommon knowledge
Link Grammar parser A parser (in C) [3]
Bootcat Input a set of seed words, find more instances in that category [4]

Diversions

Related Upcoming Conferences and Publication Venues

  • Empirical Methods in Natural Language Processing
  • RIAO

References

[2] @article{ Keller03-cl,
author = {Frank Keller and Mirella Lapata},
title = { Using the Web to Obtain Frequencies for Unseen Bigrams },
year = 2003,
journal = CL,
volume = 29,
number = 3,
pages = {459-484},
url = {http://www.aclweb.org/anthology/J03-3005.pdf}
}
[3] Timothy Chklovski. LEARNER: A System for Acquiring Commonsense Knowledge by Analogy. In Proceedings of Second International Conference on Knowledge Capture (K-CAP 2003). October 2003.
[4] Baroni, M., Kilgarriff, A., Pomika'lek, J., Rychly', P.: WebBootCaT: instant domain-specific corpora to support human translators. Proceedings of EAMT 2006, Oslo. (2006) 247-252