Tool Details

Word2Vec

Maintained by: KB Labs

Word2Vec is a tool that can be used to find semantic clusters of words that show the relation to the searched word. This gives an opportunity to analyse discourse, relations and use of words making it a powerful tool for students and researchers to use and explore.

“KB Lab’s Word2Vec tool is well suited for workshops in proto discourse analysis and conceptual history in university teaching at undergrad levels. This is the experience drawn from a multitude of courses in the history of Christianity in modernity (19th, 20th and 21st centuries). I have used the KB Lab’s Mediestream OCR enhancing tool Word2Vec in discourse analysis workshops with the purpose of detecting semantic clusters in generated word lists. On the basis of words such as ‘kristen’, ‘religion’, ‘præst’ and with great success, I have invited students to reflect upon the representation of Christianity in Danish public discourse. Furthermore, as a Grundtvig scholar I recommend the N.F.S. Grundtvig sub-corpus for heuristically tracing the semantic network of specific terms and concepts as a first step in word embedding analyses. This is an eye-opening tool for every scholar interested in a ‘distant reading’ of Grundtvig’s collected writings.

The tool is a high-dimensional word embedding based on an unsupervised machine-learning algorithm using a simple neural network. It maps each unique word in a large text corpus to a vector. 

The vector representation of the words reflects interesting semantic properties of the words. The most effective method of Word2Vec is to find words that appear in the same context as another word, because it will be close in the vector space. However, distance between words can also be generalised and produce qualified guesses for analogies. The Word2Vec demo features several corpora and a very large one based on over 65.000 Gutenberg E-books containing multiple languages. 

Tutorial

Do you have questions or comments? Or do you want to join the KB Labs community?

Please contact community lead Katrine Hofmann Gasser, section manager at the Royal Danish Library at khg@kb.dk.

Overview of service specifications

TaDiRAH 1. Level category: Analysis, Interpretation
TaDiRAH 2. Level category: Relational Analysis, Theorizing, Contextualizing
Platforms: Windows, Mac, Webbased service
Tutorial name: Tutorial Word2Vec
Cost: Free
Research objects: Text, Web
Last modified: August 1, 2019