A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem".
English stemmers are fairly trivial (with only occasional problems, such as "dries" being the third-person singular present form of the verb "dry", "axes" being the plural of "ax" as well as "axis"); but stemmers become harder to design as the morphology, orthography, and character encoding of the target language becomes more complex. For example, an Italian stemmer is more complex than an English one (because of more possible verb inflections), a Russian one is more complex (more possible noun declensions), a Hebrew one is even more complex (a hairy writing system), and so on.
Stemmers are common elements in query systems, since a user who runs a query on "daffodils" probably cares about documents that contain the word "daffodil" (without the s).
(This dictionary has a rudimentary stemmer which currently (April 1997) handles only conversion of plurals to singulars).
(01 Mar 1997)
|Bookmark with:||word visualiser||Go and visit our forums|