Latent semantic analysis |
Latent semantic analysis (LSA) is a technique in . It is sometimes called latent semantic indexing (LSI).
= Applications =
Applications of LSA include the : *In synonymy, different writers use different words to describe the same idea. Thus, a person issuing a query in a search engine may use a different word than appears in a document, and may not retrieve the document. *In polysemy, the same word can have multiple meanings, so a searcher can get unwanted documents with the alternate meanings.
= Occurrence matrix =
LSA uses a term-document matrix which describes the occurrences of terms in documents; it is a sparse matrix whose rows correspond to documents and whose columns correspond to terms, typically stemming words that appear in the documents. A typical example of the weighting of the elements of the matrix is tf-idf: the element of the matrix proportional to the number of times the terms appear in each document, where rare terms are upweighted to reflect their relative importance.
This matrix is common to standard semantic models as well (though it is not necessarely explicitly expressed as a matrix, since the mathematical properties of matrix are not always used).
= Rank lowering =
After the construction of the occurrence matrix LSA finds a low-rank (matrix theory) approximation to the term-document matrix. The reasons for the approximations can have two explanations:
Concretely, the downsizing of the matrix is often achieved through the use of singular value decomposition (SVD): the set of all the terms is then represented by a vector space of lower dimensionality than the total number of terms in the vocabulary.
The consequence of the rank lowering is that some dimensions get merged :
:: {(car), (truck), (flower)} --> {(1,3452 * car + 0,2828 * truck), (flower)}
This mitigates polysemy, as the rank lowering is expected to merge the dimensions associated to terms of similar meanings.
= Limitations of LSA =
LSA features a number of drawbacks:
LSA still remains a standard algorithm in information retrieval.
= See also =
= External links and references =
|
|