Google
 
   
Login
Username:

Password:


Lost Password?

Register now!
Search
Main Menu
top books
Polls
What do you think about php-deluxe.net?
Excellent!
Cool
Hmm..not bad
What the hell is this?
encyclopedia
recommendation
compare webbrowser
Freenet DSL
Who's Online
8 user(s) are online (8 user(s) are browsing encyclopedia)

Members: 0
Guests: 8

more...
browser tip
Unix Befehle
manual of unix befehle
recommendation!
Sponsored
partner

Latent semantic analysis

Latent semantic analysis (LSA) is a technique in . It is sometimes called latent semantic indexing (LSI).

= Applications =

Applications of LSA include the : *In synonymy, different writers use different words to describe the same idea. Thus, a person issuing a query in a search engine may use a different word than appears in a document, and may not retrieve the document. *In polysemy, the same word can have multiple meanings, so a searcher can get unwanted documents with the alternate meanings.

= Occurrence matrix =

LSA uses a term-document matrix which describes the occurrences of terms in documents; it is a sparse matrix whose rows correspond to documents and whose columns correspond to terms, typically stemming words that appear in the documents. A typical example of the weighting of the elements of the matrix is tf-idf: the element of the matrix proportional to the number of times the terms appear in each document, where rare terms are upweighted to reflect their relative importance.

This matrix is common to standard semantic models as well (though it is not necessarely explicitly expressed as a matrix, since the mathematical properties of matrix are not always used).

= Rank lowering =

After the construction of the occurrence matrix LSA finds a low-rank (matrix theory) approximation to the term-document matrix. The reasons for the approximations can have two explanations:

  • The original term-document matrix is supposed to be too large for the computing resources; in this point of view, the approximated matrix is interpreted as an approximation (a least and necessary evil )
  • The original term-document matrix is supposed to be noisy : for instance, anecdotical instances of terms are to be eliminated. From this point of view, the approximated matrix is interpreted as a de-noisified matrix (a better matrix than the original).
  • Concretely, the downsizing of the matrix is often achieved through the use of singular value decomposition (SVD): the set of all the terms is then represented by a vector space of lower dimensionality than the total number of terms in the vocabulary.

    The consequence of the rank lowering is that some dimensions get merged :

    :: {(car), (truck), (flower)} --> {(1,3452 * car + 0,2828 * truck), (flower)}

    This mitigates polysemy, as the rank lowering is expected to merge the dimensions associated to terms of similar meanings.

    = Limitations of LSA =

    LSA features a number of drawbacks:

  • The resulting dimensions might be difficult to interpret. For instance, in
  • :: {(car), (truck), (flower)} --> {(1,3452 * car + 0,2828 * truck), (flower)} the (1,3452 * car + 0,2828 * truck) component could be interpreted as vehicle . However, it is very likely that cases close to :: {(car), (bottle), (flower)} --> {(1,3452 * car + 0,2828 * bottle), (flower)} will occur. This leads to results which can be justified on the mathematical level, but have no interpretable meaning in natural language.

  • The model, which is reported to give better results than standard LSA.
  • LSA still remains a standard algorithm in information retrieval.

    = See also =

  • Vectorial semantics
  • DSIR model
  • Latent Dirichlet allocation
  • = External links and references =

  • [http://lsa.colorado.edu/ the first place to start with LSA]
  • [http://lsa.colorado.edu/papers/dp1.LSAintro.pdf Introduction to Latent Semantic Analysis], by [http://psych.colorado.edu/~landauer/ T. K. Landauer], P. W. Foltz, & D. Laham, Discourse Processes , 25, 259-284 (1998).
  • [http://lsi.research.telcordia.com/lsi/papers/JASIS90.pdf Indexing by Latent Semantic Analysis], by S. Deerwester, [http://www.research.microsoft.com/~sdumais/ S. T. Dumais], G. W. Furnas, T. K. Landauer, R. Harshman, Journal of the Society for Information Science , 41(6), 391-407, (1990).
  • [http://iv.slis.indiana.edu/sw/lsa.html InfoVis page on Latent Semantic Analysis]
  • [http://www.cs.brown.edu/people/th/papers/Hofmann-UAI99.pdf Probabilistic Latent Semantic Analysis], by T. Hofmann, Proc. Uncertainty in Artificial Intelligence , (1999)