Nowadays, cross-lingual Information Retrieval (IR) is one of the
greatest challenges to deal with.
Besides, one of the most important issues in IR consists in the corpus
vocabulary reduction in order to make possible to use in real
situations some methods of IR such as the well-known vector space
model. In this work, we have considered a vocabulary reduction process
based on the selection of mid-frequency terms. Our approach enhances
precision, but in order to obtain a better recall, we have
conducted an enrichment process based on the addition of co-ocurrence
terms. By using this approach, we have obtained an improvement of 40%
in the corpus of the BiEnEs WebCLEF 2005 task. The obtained results in
the current mixed monolingual task of the WebCLEF 2006 have
shown that the text enrichment must be done before the vocabulary
reduction process in order to get the best performance.