next up previous
Next: Bibliography Up: A Comparative Study of Previous: Results


We have carried out a comparative study of the behaviour of five clustering methods applied to two corpora with very different characteristics. Each corpus belongs to a very narrow domain doing our task even more difficult. The use of the transition point technique have been successful and we have observed that this technique obtains best results in comparison with the DF and TS techniques. Moreover, those results are stable upon the use of different clustering algorithms. This suggests that there exists an independence between the feature selection techniques and the clustering methods. Despite we have used a very strong measure for the clustering process (F-Measure), it would be desirable to repeat the experiments over other corpora of different domains to confirm our hypothesis. Unfortunately, at the moment there exist a lackness of gold standard for clustering abstracts on narrow domains, doing this task even more difficult. We consider that more attention from the linguistic community is required for the clustering of narrow domain task, not only for experimenting on different feature selection techniques, but also for constructing new narrow domain corpora, with gold standards provided by experts in such domains.

next up previous
Next: Bibliography Up: A Comparative Study of Previous: Results
David Pinto 2006-05-25