Next: Introduction
Clustering Abstracts of Scientific Texts using the Transition Point Technique
David Pinto(1,2) - Héctor Jiménez-Salazar(1) - Paolo Rosso(2)
(1){davideduardopinto, hgimenezs}@gmail.com
(2){dpinto, prosso}@dsic.upv.es
Abstract:
Free access to scientific papers in major digital libraries and other web repositories is limited to only their abstracts. Current
keyword-based techniques fail on narrow domain-oriented libraries, e.g., those containing only documents on high energy physics like
those of the hep-ex collection of CERN. We propose a simple procedure to cluster abstracts which consists in applying the
transition point technique during the term selection process. This technique uses the middle frequency terms to index the documents due
to the fact that they have a high semantic content. In the experiments we have carried out, the transition point approach has been
compared with well known unsupervised term selection techniques. Transition point technique shown that it is possible to obtain a better
performance than traditional methods. Moreover, we propose an approach to analyse the stability of transition point term selection
method.
David Pinto
2006-05-25