Tour de CLARIN: Denmark – CST Lemmatizer

CLARIN-DK featured in “Tour de CLARIN”

In April and May 2019, CLARIN-DK was featured in the CLARIN series “Tour de CLARIN”designed to highlight the many national consortia under CLARIN-EU and to showcase their work, inspirational use cases and achievements.

One of the highlights is a tool that the national consortium, in this case CLARIN-DK, is particularly proud of. The CLARIN-DK has chosen to feature the CST lemmatizer. Read the full Tour de CLARIN blog post about the CST Lemmatizer from CLARIN-DK.

The CST lemmatizer has been developed over many years and as part of various projects, especially the Danish STO (Jongejan and Haltrup 2005) and the Nordic Tvärsök (Jongejan and Dalianis 2009). While it was initially used as a tool to support Danish lexicographic work, it has gradually been extended with a dynamic self-learning algorithm.

What is a lemmatizer?

Lemmatizers generalize over the different forms of a word used in free text and provide its lemma, which is the base or dictionary look-up form. They are therefore one of the basic NLP tools which are not only important for NLP, but also for lexicographic work and all text-based studies.

What is “Tour de CLARIN”?

“Tour de CLARIN” is a CLARIN ERIC initiative that aims to periodically highlight prominent User Involvement (UI) activities of a particular CLARIN national consortium. “Tour de CLARIN“ helps:

  • to increase the visibility of the national consortia
  • to reveal the richness of the CLARIN landscape
  • to display the full range of activities throughout the network.

The series is shared through the CLARIN Newsflashblog postsFacebook and Twitter.  Check out the “Tour de CLARIN” brochures for the member countries and the first publication Tour de CLARIN, volume I, November 2018