Research examples

Get inspired by examples of research performed by using the DIGHUMLAB infrastructure.

Tour de CLARIN: Denmark

“Tour de CLARIN” is an initiative that aims to highlight prominent activities of a particular CLARIN national consortium, in this case CLARIN-DK.

Read More

Experiments with Big Video

New technologies give enhanced methods for video ethnography The DIGHUMLAB community VILA supports research into embodied human interaction in a wide range of environments and with a focus on social...

Read More

Gesta Danorum

Language technology, a shortcut to scientific evidence. This case is an example of how language technology can be exploited in research within the humanities.

Read More

This case from explains how to locate missing metadata for radio programmes by using the programme schedules. Be inspired to use

Read More

Sign up for our newsletter

Sign up for DIGHUMLAB’s newsletter to get news, events, digital inspiration and updates from the field of digital humanities.

User impressions – Silke Holmqvist

“Media in general, both radio and TV, are heavily under-prioritised as historical sources, even though they are central for our modern time. In a historical perspective, we know very little...

Read More

Research examples

Get inspired by examples of research performed by using the DIGHUMLAB infrastructure.

VILA | Software Release | May 2021

STEP INTO YOUR OWN VIDEO DATA WORLD: Ground-breaking software empowers researchers to “inhabit” video data in virtual reality through immersive 360° video technology

A team at Aalborg University, Big Soft Video led by Professor Paul McIlvenny and Associate Professor Jacob Davidsen, is launching a new piece of software for Immersive Humanities research, AVA360VR, that revolutionises traditional interaction and video research and holds massive potential for education and pedagogical training.

 “Imagine this. You are a researcher interested in classroom interaction. You go out “into the field” to make a video recording of an authentic classroom – not with a traditional video camera, but with a 360° camera, to avoid missing out on important activities. Now, when you get back to your office, instead of watching your recording on a flat computer screen, you “step into” the video data with a virtual reality headset – not just to watch the classroom, but to re-inhabit it. This is next level qualitative video analysis. And it’s possible with AVA360VR – a tool that leads the way for immersive qualitative analytics,” Jacob Davidsen says.

AVA360VR is the name of the software that the team Big Soft Video at Aalborg University is developing. The need for this software is a response to the ways that working with qualitative video data today fall short.

This software can revolutionise the way we do video-based research,” according to Jacob Davidsen, “and it holds massive potential for education and pedagogical training in practice-based fields – for example training of health care professionals or pedagogues in nursery education.”

An upcoming feature is a collaborative version where users can view and annotate the same 360 video across locations – a minimal viable solution has already been taken into use. An example could be a clinical supervisor who uses AVA360VR with their students to analyse, train and practice together. Or a group of international researchers analysing their data collaboratively in AVA360VR.

Traditional video and the inevitable risks of compromising data

A common methodological challenge in interaction research with video is how and where to place the camera. Researchers must consider camera position carefully to capture as much of the interaction as possible. It is common for researchers to add multiple cameras simply to secure maximum coverage. However, traditional video, regardless of how many cameras are used, often lose interactionally salient people, actions, events and objects out of sight (and frame). 

“I’m sure most of us are familiar with watching a video recording in which the important parts end up taking place on the margins or even outside of the camera’s gaze. This is a common challenge for researchers – and that’s why 360° video is a complete game-changer,” Jacob Davidsen explains.

What is 360° video?

A 360° video, also referred to as immersive videos, is a video recording that records every view direction at the same time, typically with an omnidirectional camera.

Facebook and YouTube already allow for uploading and navigating 360° videos on their platforms. Add to this a growing number of media players such as VLC, GoPro VR and PotPlayer that have launched navigational features for 360° video.

Flat screens, flat data

Within the past few years, humanities and social science researchers have increasingly adopted the use of the more holistic 360° cameras. Yet still, most researchers watch and analyse the 360° video recordings on their flat desktop screens. This comes with a number of drawbacks, Jacob and Paul argue.

“The only available solution at the moment is to watch and analyse 360° video on flat computer screens. We would argue that video should not be flat, but immersive – that it should be experienced in VR. Flat screens render a false perception of relative positions. In a sense, it is the difference between conceptualising the world on a flat map vs. on a round globe. The former stretches the data in order to ‘map’ it onto a flat surface; the latter is truer representation of the real thing. With our software, we invite researchers to ‘step into the middle of the globe’ and look out in 360° onto their video data world. It’s enhanced video data immersion,” Jacob explains.  

Besides poor representation, there is another drawback with rendering 360° video on flat screens. The current market-leading media players for computers provide only limited options for working analytically with 360° video. Most media players offer simple navigational features and no analytical functionalities:

 “When researchers have captured 360° video, they have very limited options for working with it afterwards. Navigating around in a video is powerful for reliving the experience, but it will only take the researcher so far. Researchers rely on fundamental research practices such as annotating, processing, analysing and extracting data. These core research processes have not been supported in any 360° software – until now,” Jacob and Paul explain. 

From watching to inhabiting video – meet AVA360VR

Fuelled by a lack in the market, the Big Soft Video team is launching the innovative software AVA360VR. It is a unique and flexible tool for Annotating, Visualising and Analysing 360° video in Virtual Reality. The tool empowers users – researchers, students and educators – to relive and inhabit any 360° video-recorded situation in virtual reality. It is our idea of immersive humanities – allowing a new way of performing research and dissemination in the humanities.

AVA360VR allows users to work directly in the 360 video – it is like one big canvas for research. For example, users can embed objects onto the 360° video recording – objects such as external images, notes, transcripts, traditional video recordings, and more. Furthermore, user are able to add annotations, such as drawings, notes and arrows, and even animate the objects and annotations so they follow the movements of relevant participants. Finally, it is possible to integrate multiple video cameras in one “reality” and then jump from camera to camera to change the viewpoint of the same interaction.

These functionalities set AVA360VR at the very forefront of virtual reality technology, immersive qualitative analytics, qualitative video analysis and interaction research.

Four innovative features in AVA360VR



Jump between cameras with a single click, and take a step deeper towards qualitative immersive analytics and 6 degrees of freedom.



Drag traditional videos, images, texts, maps, biometric data etc. onto the 360° video to make sure all data is “at hand” for your analysis.



Write, highlight and draw on the video. Animate the annotations (and the embedded objects) so they follow important participants.



Export your data from AVA360VR with capture tools such as video snippets, frame grabs and 4-view shots. No need for editing afterwards.

Endless applications for both researchers and practitioners

Initially, the Big Soft Video team designed AVA360VR as a flexible, versatile and analytical tool for researchers who engage with huge amounts of qualitative 360° video data. However, during the development process, it soon became clear that AVA360VR carries massive potential not just for researchers, but also practitioners. The tool is ideal for teaching and training purposes across all kinds of sectors, for example health and education – and stakeholders from these areas have already taken an interest.

 “Originally, we wanted to present interaction researchers with a tool to work smarter and more immersively with their video material. However, we soon realised that this tool is attractive far beyond research. A 360° video recording of authentic situations from a an operating theatre or a classroom can be put to good use when teaching students about communication, interaction with patients, medical procedures etc. The 11,000 nursing students in Denmark could use the tool as preparation before venturing into “the field” or as patient-nurse communication training. Lecturers can embed sound clips of themselves, acting as a voiceover that points students’ attention to crucial interactional parts,” Jacob explains.

The team is looking for funding to develop the software further as an infrastructure for immersive humanities and also as a pedagogical training tool. Currently, the Big Soft Video team has a full-time programmer, Artúr Barnabás Kovács, working to stabilise and improve the existing features and introduce new ones.

Get started with AVA360VR now

Do you want to get started with AVA360VR right away? The basic requirements for running AVA360VR are a VR-ready computer as well as a VR headset (all commercial headsets can be used). The software itself is open-source and is available here:

Software, help tutorials, demo project and support


Help pages


Please get in touch with Jacob Davidsen, Associate Professor at the Department of Communication and Psychology, Aalborg University, at

Behind the researcher | Paul McIlvenny

Paul McIlvenny is a Professor at the Department of Culture and Global Studies at Aalborg University. He holds a PhD from Edinburgh University, Scotland, and is active in the following centres and research groups: Centre for Discourses in Transition (C-DIT), Centre for Mobility and Urban Studies and VILA (Video Research Lab, a part of DIGHUMLAB) at Aalborg University.

Behind the researcher | Jacob Davidsen

Jacob Davidsen is an Associate Professor at the Department of Communication and Psychology at Aalborg University. He holds an MA in Information Science and is active in VILA (Video Research Lab, a part of DIGHUMLAB) at Aalborg University.

His research interests include computer supported collaborative learning and embodied interaction analysis. 

About VILA

VILA is one of six research communities under the national consortium DIGHUMLAB. VILA supports research into embodied human interaction in a wide range of environments and with a focus on (social) cognition, learning and design. VILA offers access to labs, materials, booking equipment, online tutorials for video analysis software and workshops.

Read more about VILA here.


BIG VIDEO is a programme at Aalborg University that aims to develop an enhanced infrastructure for qualitative video analysis with innovation in four key areas: 1) Capture, storage, archiving and access of digital video, 2) Visualisation, transformation and presentation, 3) Collaboration and sharing, and 4) Qualitative tools to support analysis. 

Read more about the background for the programme or the BIG VIDEO manifesto. Check out the BigSoftVideo space in GitHub.

Read more

VILA in practice: A use case with Tobias Boelt Back, PhD, AAU

“VILA opens up to a new world of qualitative data”

Tobias Boelt Back, PhD and external lecturer at the Department of Culture and Learning at Aalborg University, recently defended his PhD thesis which explores  resemiotisation and question designs in Danish talkshows and comic transcription. During the project, he has consulted VILA who have provided crucial expertise for his video data collection. And now, according to himself, he cannot imagine a qualitative study without VILA.

Tobias Boelt Back’s PhD project deals with Danish talkshow interviews, more specifically with how talkshow hosts evoke feelings in their talkshow guests on live TV through various question designs.

“My PhD project is about talkshow interviews and how the editorial staff at a Danish talk show plan and pre-produce interviews with the aim of evoking emotions – and how they finally conduct the live show.  For example, a lot of consideration goes into posing questions that make the guests feel something and by extension the viewers.”  

Tobias works with mass media production as a process, not just the final show on the screen. He has collected data through the process of planning the interview questions, from morning meetings to the final interview interaction. For this, he employs the concept of “resemiotisation” – the translation of discourse from one modality (e.g. conversation) to another (e.g. writing):

“When the talkshow hosts gather at their morning meetings, they discuss which guests to invite, how to form their questions, how to choose where to be in the studio, how to design what is on the back screen. And the discussions also need to be boiled down to questions on cue cards that the talkshow host can bring to the studio. I’ve recorded their meetings, collected their manuscripts, cue cards and more to explore the ‘semiotic ecology’ of the talkshow interview in order to really hone in on the planning process. The point is to see that what we see on live TV – the actions, discourses, materialities etc. – are rendered possible by prior actions, discourses, materialities, etc.” Tobias explains. 



One more time with feeling. Resemiotising boundary affects for doing ’emotional talk show’ interaction for another next first time


  • How is the joint accomplishment of resemiotising semiotic ecologies for doing ‘affective talk show’ interaction sequentially organised and materially structured across series of work-relevant activities?
  • What emanates from these resemiotisations that set up the relevance for certain immediate and remote future (inter)actions?
  • How are these semiotic resources reworked in order to render displays of affect intelligible as sequentiably relevant nexts on different time-scales?
  • How can we claim an adjacent relationship between non-contiguous (inter)actions separated in time and space?


  • Ethnomethodological field work
  • Charles Goodwin approach 
  • Conversation analysis
  • Mediated discourse analysis and resemiotisation. 


  • A variety of data from a Danish talkshow: video recordings of morning meetings, manuscripts, cue cards, images, personal notes, text (sms) correspondences, phone callss, and more.

Comic transcriptions as a new methodological contribution – bringing the visual to the forefront of conversation analysis

Seeing as some of Tobias’ video data contains sensitive personal data, he has had to find a way to anonymize it. The result? Comic transcriptions.

“In my thesis, I render my video data as comic transcriptions. We seem to have decided that comics are for kids, not for serious conversation analysis researchers. But comic transcriptions are really easily accessible and intelligible to everyone. And thus, a great way to render video data. Actually, this methodological choice is a break with the dominating tradition for coding movement in transcriptions. Instead of describing what informants are doing, as is done traditionally, then how about showing it through comics?Tobias explains.

“For many of the pages in my thesis, it is like reading a comic. And it really made me excited about doing transcriptions again,” Tobias adds jokingly. “And I was only able to do comic transcriptions because I was introduced to VILA and the gear that they grant researchers like me access to.” 

Findings: ‘The local interaction’ is an illusion

One of the main findings of the thesis is that any final question posed on a talkshow is made only through several steps of collaboration. Asking interview questions on a talkshow is not a one-man army effort: 

“Although it seems like a natural conversation between two people, the host and the guest, my thesis claims it’s a conversation between 27 people. And in a sense, the questions never stop being negotiated. Whatever happens on the screen is not an isolated event: The questions that are being posed has been discussed, down to their wording and syntax, and there is a sense of intertextuality between the morning meeting discussions on question designs and the final questions during the show,” Tobias explains.

One of the major points of Tobias’ PhD thesis  is a criticism of the conversation analytical notion of “the local interaction”.  One interaction is often looked at as an isolated event when really it is embedded in a larger network of interactions. This analytical look at process and planning provides interesting insights into the human ability to “anticipate and project relevant ‘nexts’” or to “cultivate a specific sequential outcome”. It tells something about the human ability to predict how people will react to what we are planning now – an ability that is crucial for our social lives according to Tobias. 

What have you used VILA for?

“When I wrote my MA thesis on interview techniques and presuppositions in the question design of journalist and editor Martin Krasnik, I used zero image or video data. Then I came to Aalborg University and met the people at VILA, and I was introduced to a whole new world. If I hadn’t talked to VILA, I wouldn’t know how to use or set up cameras in the best possible way, and now, I couldn’t imagine doing a qualitative study without consulting VILA Tobias tells. 

“Digital humanities at Aalborg University, for me, is summed up in VILA. They have so much expertise that once I got to know them, I realised how little I knew about it myself. I’d never used a camera for research before, so I just thought ‘I’ll go out and put up a camera’. But it’s not that straightforward, and luckily, there are no stupid questions with VILA. When you explain to them what you want to accomplish with your video data, they’ll easily give you five suggestions for solutions,” says Tobias. 

Three advantages with VILA

Refocus your video data

With 360 degree video cameras from VILA, you’re able to look in all directions in your video data – also the angles you didn’t expect to be relevant when you set up the camera. 

Broad range of tech gear

VILA lends out a broad range of gear for recording sound and video: 360 degree cameras, spatial audio recorders, 2-in-1 cameras and much more. 

Expertise and guidance

Tobias had never collected video data before, so to get a thorough and practical run-through of the dos and donts of video data collection from VILA has been valuable.

Can you recommend VILA to others?

“Yes, I would recommend VILA to researchers who work with interaction analysis one way or the other. But it’s also relevant for researchers who record all kinds of interviews or meetings. I think for the next generation of researchers, this will be the standard way to collect video data, and I think we owe it to for instance our PhD students to engage with this new technology. And for that, VILA is an excellent entry point,” Tobias explains.

Tobias talks about what tasks and projects lie ahead: 

“At the moment, I am an external lecturer at Discourse Studies at Aalborg University, but I’m looking forward continuing to work on exciting research projects. I’m excited to collect video data with cameras that are more than 4K, which was state-of-the-art when I collected data, but which now already is becoming outdated,” Tobias finishes with a laugh.  

Behind the researcher

Tobias Boelt Back, PhD and external lecturer at the Department of Culture and Learning at Aalborg University, defended his PhD thesis in August, 2020.

Tobias holds a BA in Danish from Roskilde University and an MA in Psychology of Language from Copenhagen University.

His research interests include ethnomethodology, conversation analysis, multimodality, discourse analysis and mass media production.

Read more in practice: A use case with Silke Holmqvist, PhD student, AU

This article is in Danish. An English translation is available here.  

Dét, som tilbyder, finder man bare ikke andre steder

Silke Holmqvist, ph.d.-studerende ved Afdeling for Historie og Klassiske Studier ved Aarhus Universitet, er ved at lægge sidste hånd på sit ph.d.-projekt om forestillinger om gæstearbejdernes følelsesliv på fjernsyn fra 1960’erne til 1980’erne. I løbet af projektet har givet hende mulighed for at kortlægge metadata for en af projektets bærende kildegrupper, tv-programmer.

Silke Holmqvists ph.d.-projekt handler om forestillinger om gæstearbejderes følelsesliv, sådan som det vises og fortolkes gennem billeder, som cirkulerer i det offentlige rum – fx gennem medier – fra 1960’erne til 1980’erne.

”Især ’tyrken’, som han blev kaldt dengang i starten af 1970’erne, blev for eksempel fremstillet som dydig, hårdarbejdende og loyal. I starten af 1970’erne handler stereotypen om at han drikker ikke, går tidligt i seng og sparer på pengene til familien. I samtiden er danskerne begyndt at købe fjernsyn, skifte tapetet ud, købe nye gardiner og designersofaer til deres parcelhuse. Der opstår en idealisering af og forestilling om den dedikerede, sparsomme og beundringsværdige brune mand, der ankommer på togstationen for at dedikere en periode af sit liv til arbejde langt væk fra sin familie. Sidenhen i slut 70’erne og 1980’erne ser vi problematiseringen af gæstearbejdere i Danmark. Nu så man ham på fjernsyn for eksempel i sociale boligbyggerier. Han blev vist som uarbejdsom, patriarkalsk og på overførselsindkomst. Men altså det handler om den samme mand, og i hele perioden har alle mulige gæstearbejdere jo været lige så forskellige fra hinanden, som etniske danskere er,” forklarer Silke Holmqvist.

Overblik | Ph.d.-projekt, Silke Holmqvist


Inventing an immigrant – An emotional geography of guest worker images in Denmark c. 1960-1989


Uddrag for projektet: “I ask how media, including mass media but also the urban environment as a medium in itself, influenced the changing contours of the figure of the guest worker. The research design pays attention to the interaction between the emotional repertoires associated with or identified by visible minority workers and the urban places (material and fictional geographies) in which the guest worker was installed”.


  • Kulturhistorie
  • Indvandrings- og minoritetsforskning
  • Følelsesgeografi


  • Fjernsyn, radio og trykpresse om periodens indvandring
  • Erindringsindsamlinger fra gæstearbejdere
  • Tyrkisk/dansk litteratur, selvbiografier af gæstearbejdere

Hvad har du brugt til?

”Jeg startede mit ph.d.-forløb med at sidde og bladre alle fjernsynsprogramoversigter igennem fra tre årtier nede på Det Kgl. Bibliotek. Dengang kendte jeg ikke til’s digitalisering af programoversigterne – og det kunne have været fedt at kunne søge digitalt i stedet for at bladre,” fortæller Silke.

Silke har brugt til at tjekke programoversigter – dels de programmer, hun selv allerede havde fundet i de trykte programoversigter hos Det Kgl. Bibliotek, dels de DR-programmer, hun fik adgang til. Med har hun haft mulighed for at dobbelttjekke metadata på programmerne – data om, hvornår de er blevet sendt, har det været i primetime, hvornår er de blevet genudsendt, hvordan bliver programmerne beskrevet, osv.

Silke Holmqvist fortæller, at hun også har brugt i sin undervisning, hvor hun har opfordret sine studerende til at bruge det, fordi det er et oplagt sted at hente kildemateriale.

”Medier i det hele taget, både radio og TV, er så underprioriterede som historiske kilder, men de er så centrale for vores moderne tid. I et historisk perspektiv ved vi så lidt om, hvad der egentlig er foregået i primetime. Danskerne sad og sugede 1 time og nogle og tyve minutter til sig i gennemsnit hver dag i en tyveårig periode, men vi ved ikke særlig meget om hvad de kiggede på. Og det kan være med til at give et rigtig godt indblik i,” forklarer Silke.

Sådan har Silke brugt – et eksempel

”Jeg vidste, hvilke årtier jeg gerne ville lede i. Jeg har afgrænset dem med 5 års intervaller for at være sikker på, at jeg ikke fik uoverskueligt mange resultater. Så har jeg oprettet mine egne projekter, hvor jeg først har søgt på gæstearbejder, fremmedarbejder, arbejdsindvandring, tyrker, pakistaner, jugoslaver osv. Så har jeg lavet de samme søgninger på tværs af årtier.

I det store billede vidste jeg godt, hvad der skulle ligge der fra de fysiske programoversigter fra Det Kgl. Bibliotek, men nogle gange blev jeg overrasket over, at der lå et program, jeg ikke kendte til, eller en radioudsendelse, jeg kunne høre. Jeg er blevet positivt overrasket over, hvor mange radioprogrammer jeg har kunnet tilgå.”

Tre fordele ved

Brugbare funktioner

Opret egne projekter, annotere, hop frem i lyden med 10 sek. fungerer rigtig godt.

Intuitiv brugerflade

Platformen er nem at tilgå og hurtigt at komme i gang med og har et brugervenligt interface.

Godt sted at finde historisk kildemateriale

Radio og TV er underprioriteret som kilder, men er centrale for den moderne periode

Det er virkelig sjældent, at man ikke finder noget i enøgning, der er perspektiverende i forhold til ens arbejde

Ifølge Silke Holmqvist er et oplagt sted at hente kildemateriale, for kolleager og studerende, fordi det har meget at byde på:

”Jeg kan sagtens anbefale til andre, og jeg gør det hele tiden. Hvis en kollega sidder med et specifikt historisk emne, så er det virkelig sjældent, at man ikke finder noget i enøgning – et radioprogram, der er perspektiverende i forhold til det, man arbejder med. er et rigtig godt sted at orientere sig bredt – om alt muligt, der er sket i vores fortid.”

Silke ser frem til at skrive sin afhandling færdig og forsvare den. Håbet er at få lov til at arbejdere videre med fjernsynsmediet og minoritetstematikken i en postdoc-stilling.

”Jeg vil blive ved med at have brug for, at er tilgængelig, for det, den har, kan man bare ikke finde andre steder,” slutter Silke.

Bag forskeren

Silke Holmqvist er ph.d.-studerende ved Historie og Klassiske Studier på Aarhus Universitet. Hun har en BA i idehistorie og en MA i kulturhistorie fra Aarhus Universitet.

Hendes forskningsinteresser omfatter moderne historie, visuel historie, kulturhistorie, minoritetsstudier og følelsesgeografi.

Read more

Tour de CLARIN: Denmark

In April and May 2019, CLARIN-DK was featured in the CLARIN series “Tour de CLARIN” designed to highlight the many national consortia under CLARIN-EU. 

Since 2016, the tour de CLARIN initiative has been periodically highlighting prominent user involvement activities in the CLARIN network in order to

  • increase the visibility of its members,
  • reveal the richness of the CLARIN landscape,
  • and display the full range of activities that show what CLARIN has to offer to researchers, teachers, students, professionals and the general public interested in using and processing language data in various forms.

Read the blog posts about CLARIN-DK below or download the full second volume of the “Tour de CLARIN” series.

Introduction | Tour de CLARIN: Denmark

Denmark has been a member of CLARIN ERIC since February 2012 and is one of its founding members. The Danish infrastructure CLARIN-DK was funded through two projects, the DK-CLARIN (2008-2010), and the DIGHUMLAB project (2011-2017). Since 2018, CLARIN-DK has …

Read full introduction

Written by Costanza Navarreta, edited by Darja Fišer and Jakob Lenardič

Denmark has been a member of CLARIN ERIC since February 2012 and is one of its founding members. The Danish infrastructure CLARIN-DK was funded through two projects, the DK-CLARIN (2008-2010), and the DIGHUMLAB project (2011-2017). Since 2018, CLARIN-DK has been funded by the Faculty of Humanities and the Department of Nordic Studies and Linguistics, University of Copenhagen. The Danish national coordinator is Costanza Navarretta and the leading institution is the Centre for Language Technology, which is part of the Department of Nordic Studies and Linguistics.

CLARIN-DK involves the following institutions:

CLARIN-DK is a stable national research infrastructure where researchers can deposit, share and download language resources such as domain-specific corpora (e.g., The Danish Parliament Corpus 2009 – 2017 and the Johannes V. Jensen Corpus, which is a literary corpus collecting the works of the famous modernist poet Johannes Jensen from the early 20th century), as well as lexicons, word lists, speech transcriptions, and audio/video files in a secure way. CLARIN-DK also offers on-line language technology tools comprising e.g. a tokeniser, PoS tagger, a lemmatiser for Danish and English, a named entity recogniser for Danish, a keyword extractor, a TEI-to-text converter and a pipeline to linguistic annotation. Tools for performing basic frequency counts of words in textual data are also included as well as visualisation and corpus linguistics tools developed by other research groups, such as Korp and Voyant. Aside from being a certified B Centre, CLARIN-DK also runs a Knowledge Centre called DANSK, which provides expertise and help with using the language resources and technologies offered by the Danish consortium together withThe Danish Language Council.

CLARIN-DK is involved in various Danish research projects and networks. For example, it is part of the Danish collaboration initiative DIGHUMLAB that involves various research communities, such as NetLAB, which is aimed at the cross-disciplinary study of internet materials, and, which is an online platform used for automatically locating missing metadata of broadcast radio programmes. CLARIN-DK is also partner in an external funded research project Infrastrukturalisme with PI Henrik Jørgensen, Aarhus University. The consortium is also involved in a research network, Multimodal Child Language Acquisition, with the University of Hong Kong and The Chinese Hong Kong University, (PI Costanza Navarretta), and contributes tools and guidance in a number of research activities comprising the linguistic annotation of medieval documents and TEI encoding of literary corpora, mainly at the University of Copenhagen. CLARIN-DK is also involved in research data management and the promotion of FAIR data in the Humanities.

The CLARIN-DK team participates in the following CLARIN  committees: Standing Committee for CLARIN Technical Centres  (Lene Offersgaard, Bart Jongejan), Legal and Ethical Issues Committee: Sussi Olsen, Assessment Committee (Lene Offersgaard as Chair).

Tool | CLARIN-DK presents the CST lemmatizer

Lemmatizers generalize over the different forms of a word used in free text and provide its lemma, which is the base or dictionary look-up form. They are therefore one of the basic NLP tools which are not only important for NLP, but also for lexicographic work and all text-based studies…

Read full text

Written by Bart Jongejan and Costanza Navarretta, edited by Darja Fišer and Jakob Lenardič

Lemmatizers generalize over the different forms of a word used in free text and provide its lemma, which is the base or dictionary look-up form. They are therefore one of the basic NLP tools which are not only important for NLP, but also for lexicographic work and all text-based studies. They are especially indispensable in morphologically rich languages that have a large number of word forms for the same lemma, which severely hinders querying or processing all of them in running text.

The CST lemmatizer has been developed over many years and as part of various projects, especially the Danish STO (Jongejan and Haltrup 2005) and the Nordic Tvärsök (Jongejan and Dalianis 2009). While it was initially used as a tool to support Danish lexicographic work, it has gradually been extended with a dynamic self-learning algorithm which learns new lemmatization rules from morphological lexica that contain the relations between word forms and their corresponding lemmas. The lemmatization rules are organized in a decision tree.

In comparison to other state-of-the-art stemmers and rule-based lemmatizers, the current version of the CST lemmatizer learns lemmatization rules not only from word endings, and recognizes a wide variety of derivational patterns; e.g., prefixation, infixation, suffixation.  Therefore, it can deal with languages with different morphological systems. Currently, the CST lemmatizer has been trained on 25 languages. The list of these language-trained versions of the CST lemmatiser available from the Center for Language Technology is in Figure 1.

Figure 1: The languages for which the trained CST-lemmatiser is available.

Danish and English texts can be lemmatized online with the CST lemmatizer. The lemmatizer is available for download via GITHUB. Figure 2 shows the CLARIN-DK web service for the CST-lemmatizer, while Figure 3 shows a Danish example sentence that was lemmatized with the tool.

Figure 2: The online CST lemmatiser on CLARIN-DK.

Figure 3: Lemmatization of the Danish sentence Dog, året der er gået, kan også have budt på tunge stunder — ikke alt er glæde for os alle  (“However, the past year can also have provided sad moments – not everything can give happiness to all of us ”), which is taken  from the 2017 New Eve talk by the Danish Queen.

The CST lemmatizer trained for Danish has been used in many NLP projects, but also outside the NLP community.  Frederik Hjorth, who is a political science researcher at the Department of Political Science, University of Copenhagen, has applied the CST lemmatizer to political speeches as one of the preprocessing steps in order to investigate how members of the existing political parties have addressed right-wing populists who have been challenging the order of the established political system (Hjorth 2018). The results of the study  indicate that young politicians are often willing to engage with the populists as well as with other politicians across the political spectrum in name of democratic freedom (which Hjorth calls the strategy of engagement), while older politicians often describe the populist challengers as morally illegitimate (which Hjorth calls the strategy of disparagement) and refuse to discuss with them.

The CST lemmatizer was also used for many other languages in different linguistic projects. For example, it was trained on Russian (Sharoff and Nivre 2011) and then used e.g. for event identification (Solovyev and Ivanov 2016), and for anaphora and co-reference resolution (Toldova et al. 2014).


Jongejan, Bart and Dorte Haltrup. 2005. The CST Lemmatiser. Center for Sprogteknologi, University of Copenhagen version 2.7.

Jongejan, Bart and Hercules Dalianis. 2009. Automatic Training of Lemmatization Rules That Handle Morphological Changes in Pre-, in- and Suffixes Alike. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 – ACL-IJCNLP ’09. Vol. 1 Suntec, Singapore: Association for Computational Linguistics p. 145.

Frederik Hjorth. 2018. Establishment Responses to Populist Challenges: Evidence from Legislative Speech. 2018 Annual Meeting of the Danish Political Science Association

Sharoff, Serge and Joachim Nivre. 2011. The proper place of men and machines in language technology: Processing Russian without any linguistic knowledge. In Proc. Computational Linguistics and Intelligent Technologies DIALOGUE2011, Bekasovo, 591–604.

Solovyev, Valery and Vladimir Ivanov. 2016.  Knowledge-driven event extraction in Russian: corpus-based linguistic resources, Computational intelligence and neuroscience, 11 pages.  

Toldova, Svetlana et al. 2014. RU-EVAL-2014: Evaluating Anaphora and Coreference Resolution for Russian. Computational Linguistics and Intellectual Technologies, Vol. 13 (20), pp. 681-694.

Resource | CLARIN-DK presents the Grundtvig’s Work Corpus

Nikolai Frederik Severin Grundtvig was a theologian, a priest, a philosopher, a poet, a writer, a teacher and a politician (member of the Rigsdagen, one of the two parts of the Parliament), who lived in Denmark between 1783 and 1872. He was contemporary with Hans Christian Andersen and Søren Kirkegaard …

Read full text

Written by Dorte H. Hansen and Costanza Navarretta, edited by Darja Fišer and Jakob Lenardič

Nikolai Frederik Severin Grundtvig was a theologian, a priest, a philosopher, a poet, a writer, a teacher and a politician (member of the Rigsdagen, one of the two parts of the Parliament), who lived in Denmark between 1783 and 1872. He was contemporary with Hans Christian Andersen and Søren Kirkegaard. Grundtvig’s ideas have had a lasting impact on many areas of Danish culture like education, politics and the church. For example, Grundtvig advocated for a reform of the school system, which also included educating adults to participate actively in society and in the cultural life. Therefore, Grundtvig is considered to be the mind behind the folk high school. He was part of the national romantic movement, and contributed to the development of the Danish national awareness. Grundtvig’s written works are thus an important key to the understanding of Danish culture and mentality.

Figure 1: N.F.S. Grundtvig

The collection Grundtvig’s Works are published by the Grundtvig Center at the University of Aarhus and will contain 1000 text critical and commented editions of the printed authorship by N.F.S. Grundtvig when finalized in 2030. The works are available to the public through a searchable interface including registers of persons, places and bible citations.

The researchers at the Grundtvig Center wanted to a reliable and consistent way to cite the publication and a sustainable and interoperable environment in which they could share the work among other scholars and the public in general. Since the Grundtvig Center itself does not offer the possibility for downloading the underlying files, CLARIN-DK was approached as a repository provider.

Figure 2: The corpus in the CLARIN-DK repository

The corpus, now deposited in CLARIN-DK’s D-Space repository ( ), consists of app. 1300 TEI encoded xml-files of which approximately 450 are critical editions manually annotated with person names, place names, mythological names, bible citations and comments. When new versions of the works are released, they will be uploaded as new versions of the corpus in the CLARIN-DK repository.

Figure 3: A look into Haandbog i Verdens-Historien (Handbook in World History) from 1833.

The language excerpt in Figure 3 shows the old orthography from before the Danish language revision in 1948, e.g.:

Original … som Man i det attende Aarhundrede troede, at Solen, efter Sigende, staaer stille istedenfor at staae op …
Normalised Danish … som man troede i det 18. århundrede, at solen efter sigende står stille i stedet for at stå op …
Literal English translation … as thought in the 1800th century, that the sun after what they said, is staying still instead of rising …

Furthermore, the excerpt shows the manual mark-up of the corpus, done by philologists at the Grundtvig Center. There are references to e.g. person names (Joseph), mythological places (Midgaard) and actual places (Europe) and comments to parts of the text (Overhuggelse af Knuden , literal English translation: the cut of the knot). The actual comment is not shown in the text.

The corpus is an excellent resource for researchers who wish to apply digital methods to investigate various aspects of Grundtvig and his epoch. For example, researchers might want to investigate Grundtvig as a historical person, address the 19th century’s literature language or orthography, or dig into his work when studying the theoretic background of the Danish folk high school tradition. The corpus is also important for scholars applying Linked Data in order to investigate the 19th century since the corpus contains the annotations of people, places and events.

Event | CLARIN-DK presents the event “Teaching the teachers – an interactive workshop for the Voyant Tools”

Digital methods are only slowly gaining ground in the teaching of literary studies in Denmark. While many lecturers are interested in introducing digital methods to their students, they often lack the knowledge of existing tools. From previous workshops, CLARIN-DK learned that neither …

Read full text

Written by Lene Offersgaard and Dorte H. Hansen, edited by Darja Fišer and Jakob Lenardič

Digital methods are only slowly gaining ground in the teaching of literary studies in Denmark. While many lecturers are interested in introducing digital methods to their students, they often lack the knowledge of existing tools. From previous workshops, CLARIN-DK learned that neither traditional NLP tools like lemmatizers, POS-taggers, and named entity recognizers, nor simple command line scripting, were suitable in such teaching scenarios. This is why CLARIN-DK started to explore other technologies, such as data visualization tools that could serve as a better and easier entry point to the use of digital methodologies for non-computational researchers and teachers.

We opted for Voyant Tools, introduced to us by information specialists from HUMlab – a datalab at the Copenhagen University Library. Voyant Tools is an online environment that performs automatic text analysis with functionalities such as word frequency lists, frequency distribution plots, and KWIC displays (Figure 15). CLARIN-DK and HUMlab have organized several interactive workshops presenting the use of this environment to lecturers and researchers at the Faculty of Humanities at the University of Copenhagen. CLARIN-DK hosted a dedicated event at the Department of Nordic Studies and Linguistics on 21 November 2018, which was attended by 12 teachers and researchers.

Figure 1: the Voyant Tools

In order to tailor the events to the needs of the participants, CLARIN-DK asked some of them in advance which literary works were most relevant to be showcased and which research questions could be investigated and discussed during the events. They opted for novels written around the Modern Breakthrough period, an era in the Scandinavian literature which started at the end of the 19th century and in which Naturalism replaced Romanticism. The Archive of Danish Literature ( provided a collection of 54 novels. The novels were preprocessed and uploaded to a local instance of Voyant Tools by the CLARIN-DK team and information specialists from HUMlab.

A research question addressed the use of terms before and after the Modern Breakthrough (1870 – 1890). If it was possible to visualize changes in the use of, for example, terms for emotions (like love) which are typical for the Romanticism period compared to the use of more concrete terms (like work) which should be more common in the Naturalism novels. Using the Trends tool in Voyant (Figure 16), it was found that the term for love is used relatively more often before 1875 than after 1888. Moreover, the term for work is not used before 1875 in the novels, while it was used after then. Therefore, the use of these terms indicates that there is a shift in the use of common themes around the Modern Breakthrough. However, by using this simplistic method, it is impossible to differentiate novels representing the Modern Breakthrough.

Figure 2: The chronological distribution of the terms love vs. work for the period between 1826 and 1899 with regard to 54 novels

We therefore investigated if other tools in Voyant could also confirm the differences between the two literary periods. In the ScatterPlot tool it is, among other things, possible to visualize the results of document similarity analysis. Figure 17 shows the document similarity using the TF-IDF frequency count for all novels in the corpus. In the figure, the novels by Herman Bang and a few novels by Sophus Schandorph are clearly separated from the other works. The novels from the late 19th century of these two writers are considered representatives of the Modern Breakthrough. It was now up to the researchers to interpret the similarities in the other groups of the scatter plot and from there to pose more research questions.

Figure 3: Novel similarity based on TF-IDF counts.

In this and other workshops, the participants soon realized that studying texts through isolated words (word forms) was limiting, and there was a clear need for lemmatization. Moreover, the need for PoS-tagged texts became evident since some researchers were interested in investigating adjectives showing emotions, while others were interested in analysing events, requiring the automatic extraction of verbs. Despite this, Voyant Tools has proved to be very illustrative and useful to get a first quantitative overview of a collection of novels, and it allowed the comparison of two or more novels.

As a follow up to this event, the CLARIN-DK team will organize a workshop introducing corpus tools and corpus querying techniques in linguistically annotated texts for Literary Studies. The event will also showcase how automatic linguistic annotations are performed on texts from before and after the Danish orthographic reform of 1948, and discuss how it is possible to circumvent problems encountered when applying NLP tools developed for contemporary texts to older texts.

Interview | Interview with Klaus Nielsen, the chief editor at the Grundtvig Study Centre

I obtained my PhD from the University of Copenhagen in 2012 and my thesis was a combination of traditional literary theory and book history, a philological field that focuses on a more mechanical-analytical study of the publication process of literary works. I focused on Gittes monologer …

Read full interview

The interview was conducted via Skype by Jakob Lenardič.

1. What is your scholarly background and your current academic position?

I obtained my PhD from the University of Copenhagen in 2012 and my thesis was a combination of traditional literary theory and book history, a philological field that focuses on a more mechanical-analytical study of the publication process of literary works. I focused on Gittes monologer, a famous collection of satirical poems by the Danish poet Per Højholt published in different versions between 1980 and 1984. I was able to observe crucial textual differences between their various published versions, which allowed me to arrive at a much richer interpretation of the poems that wouldn’t be possible with the final, best-known 1984 version alone. This showed me how important it is to combine traditional qualitative literary analysis with analytical methods that also take into consideration non-textual information such as publication history.

I now work as chief editor at Grundtvig Study Centre, where we are preparing a critical edition of the collected works of N.F.S. Grundtvig, a very prolific and multidisciplinary Danish author who published around 37,000 pages of text from 1804 to his death in 1872. We are making this corpus available in an online environment, with manual annotations that follow the scholarly standards of textual criticism. In a sense, my PhD was an important methodological steppingstone for my current work related to the Grundtvig’s Works Corpus, which also involves a close study of the differences between the various published editions.

2. The Grundtvig’s work corpus has been published through the CLARIN-DK repository. How did this collaboration start? How do you benefit from this collaboration?

We released the first version of our corpus through the CLARIN-DK repository in 2018 at the suggestion of Lene Offersgaard, with whom we were collaborating on a related project at the time. This was a great opportunity for us because we had been receiving feedback from some of our more devoted users who said they wanted the corpus in a downloadable format. We’ve also made an agreement with CLARIN-DK that as soon as we publish a new version of the corpus through our online environment, we’ll also update the version deposited in the repository with the newest, more richly annotated one.

3. How is Grundtvig’s corpus structured? What are some of the challenges you come across when annotating the corpus?

The corpus is extremely varied in terms of content, since Grundtvig was a polihistorian who wrote on a variety of different subjects. Perhaps most prominently, he wrote books on Danish history and Nordic mythology, carried out linguistic studies of Old Icelandic and Old English, translated from Latin, wrote political and philosophical texts, and composed around 1,500 hymns, many of which are still sung today in Denmark. For this reason, Grundtvig’s views are representative of the intellectual and cultural zeitgeist of Denmark in the 19th century.

There’s a downside to his varied repertoire, in that annotation is still manually intensive. We do use a database for place and person names that we feed into a named-entity recognizer, but even in this case, we often have to manually verify the results. For example, Grundtvig often refers to the philosopher Søren Kierkegaard, who was a contemporary of his, and our software is generally successful in identifying this particular named entity. However, Grundtvig often refers to him by his last name only, but since Søren Kierkegaard had a brother who was also a published author in the same period, we have to manually check the automatic recognition to make sure that the software made a link to the correct referent. In addition to this, we often come across obsolete words, in which case we manually add their possible historical meaning. This can only be done by closely reading and interpreting the surrounding text. Nevertheless, we will use the parts that have already been annotated as a baseline for a semi-automated processing of the remaining two-thirds of the corpus in the future.

One of the greatest challenges in terms of mark-up pertains to identifying Biblical references, especially in cases where Grundtvig doesn’t use direct quotes taken from the Bible but his own modified variants, or where he makes indirect references to the more obscure motifs and quotes. Although we have theologians both internal and external who closely read the texts and manually identify such references, it would be invaluable if we could also make use of a language tool that would help automatize this process of identification. I don’t think that such a tool exists yet, but it would be a very welcome addition to the CLARIN infrastructure in my opinion. Similarly, it would be great to have a tool that can automatically recognize proverbs and sayings, which abound in Grundtvig’s works, given that his work is a major part of the Danish cultural heritage. Although I’m not an expert in digital technologies, it seems that developing such a tool wouldn’t be too hard a task, as there already exist readymade digital collections of Danish proverbs that could be used as a baseline for training the tool.

4. Has the corpus been successfully used by an external research project?

Yes, Baunvig and Nielbo (2017) have  used our corpus in a case study to determine how digital methods can benefit the analysis of very large collections of written text, and uncover new perspectives and interpretations. Grundtvig Studies is a popular subfield in literary history in Denmark, and many studies on Grundtvig have been published in the past fifty years. However, previous researchers weren’t able to use digital methods and tools, which means that their claims were influenced by the limitations inherent to a purely manual approach to analysis. As I’ve said, Grundtvig produced around 37,000 pages in his lifetime, which is simply too much text for an individual researcher to read and then be able to recollect the finer details. For instance, there is an older study in which it is claimed that Grundtvig started suffering from a series of psychological problems in the 1830s, which was reflected in the texts he wrote in this decade. However, Baunvig and Nielbo (2017) were able to show, by using quantitative methods such as measuring the amount of information entropy in the corpus, that his psychological turmoil actually started earlier than was previously claimed, which is of course an important finding from a purely historical viewpoint. There has also been a follow-up study of our corpus conducted by Nielbo et al. (2018).

5. What makes this corpus particularly valuable for the CLARIN infrastructure?

I think that our rather thorough manual approach to the corpus is an important contribution for a more accurate understanding of the historical developments of the Danish language, especially its orthography. What is important in this respect is that there were no orthographic rules in Grundtvig’s time, only tendencies, which means that spelling was quite liberal in comparison to contemporary Danish. Consequently, we’re often in doubt whether the way Grundtvig spelled a certain word is an instance of spelling variation that was attested at the time or if it is just a spelling mistake on his part. This is particularly problematic in cases where Grundtvig’s idiosyncratic spelling can’t be found in the historical dictionaries of 19th century Danish, since this intuitively makes you think that the spelling variant was a mistake. However, such dictionaries weren’t compiled on the basis of the original edition but often used later published editions that had gone through the editing process, where spelling variation was normalized. This means that if a researcher wanted to study the vocabulary of 19th century Danish just on the basis of such dictionaries, he or she would miss the attested variations and consequently get a warped view of how people actually wrote at the time. By contrast, we spend a lot of time closely analysing and proofreading the materials, so we are able to present a resource that serves as a much more complex, as well as accurate, presentation of the linguistic situation at the time.

6. Could you give an example of such orthographic variation? How did you resolve it?

I actually came across a fairly interesting orthographic problem just recently when I was annotating Grundtvig’s History of the Northmen, which is one of the few texts he had written in English. In this text, Grundtvig used the word kempion in the sense of “champion” or “hero”; however, this spelling variant isn’t listed in the Oxford English Dictionary, which only includes the variant campion with an instead of an e. Because my colleagues and I weren’t sure how to solve this issue, we consulted a Professor of Middle English, and he believed it to be a spelling mistake that should be corrected in the edited corpus, given that the Oxford English Dictionary is extremely comprehensive and thorough in its account of English etymology. However, when I searched for the variant kempion on Google, I found out that it was actually attested at the time, and it was for instance used by Sir Walter Scott in his 1822 novel The Pirate, which Grundtvig was alluding to.

7. Are there any other aspects of the CLARIN-DK infrastructure that are important for your work at the centre?

Yes, especially in relation to how proactively they reach out as part of their user-involvement initiative. Last year, CLARIN-DK organized a tutorial for the philologists at our centre where they demonstrated how Voyant tools can simplify our annotation process. Using Voyant has turned out to be extremely helpful when we come across obsolete phrases the meaning of which we don’t know and can’t find in the historical dictionaries. By using Voyant’s extended search capabilities and visualisation tools, we are now able to easily chart the occurrences of this unknown phrase in the entire corpus, and then extract only those texts where this phrase seems to occur in a similar context, which then helps us determine its actual meaning.

I am also pleased to say that CLARIN-DK has already made the first version of our corpus available through their installation of the Voyant tools. We plan on updating this test version with newer ones with regularity. In the long run, I believe the availability of the corpus through CLARIN-DK’s Voyant tools will significantly streamline user assistance.

8. Your professional website says that you’re also interested in audio literature. Is this something that you’re still actively researching?

No, my research on audio literature  was mostly confined to my PhD project, because Per Højholt, who is the author of the poems that I was analysing, had read them aloud on Danish radio in the 1980s. By using an audio-analysis software called PRAAT, I measured prosodic features such as the author’s pitch and reading speed, and I was able to see how he deliberately changed his voice in accordance with the way the point-of-view character developed through the course of the poems’ narrative. This was a rather small but important finding since it hadn’t been previously acknowledged in the relevant literature on Gittes Monologer how the author’s spoken performance of his own work added new dimensions to the understanding of the poems themselves.

9. What kind of new research questions does audio literature offer in the context of Digital Humanities? Do you think that CLARIN could contribute to this field?

When I was writing my thesis, research on audio literature was still a very new field, but nowadays it is more readily agreed upon that audio recordings can serve as crucial material for textual analysis. Literary theorists are now conducting important research on the link between the reader of the audio text and the content of the text itself, and this opens up many interesting questions. Let’s say, for instance, that we are dealing with a novel written in the first person, and that the narrator is a woman. Should the reader of the audio version then also be a woman, or conversely, what interpretative repercussions would arise if the reader were actually a man? That is, the person’s voice crucially affects the way people perceive the text, much in the same way that the sort of typography of an old book can evoke various pre-conceptions in the reader about the book’s content.

Given how audio literature opens up interesting questions relevant for the emerging digital humanities, I think that new digital tools for analysing recorded literary works would serve as very welcomes additions to the CLARIN infrastructure.

10. What are your hopes for CLARIN-DK in the future?

I think that one of the future challenges for Digital Humanities in Denmark is to find a common platform where our whole research community can have a more unified and interoperable access to as many carefully annotated resources as possible. I believe that CLARIN-DK is an excellent candidate in the country for this, because our experience with releasing the Grundtvig’s Work corpus has proven to us that their repository is a stable environment through which corpora can be released in a sustainable fashion and with well-presented metadata. On top of that, the repository also allows us to integrate our corpora with other services in the consortium. For this reason, it can only be a good thing if more digital humanities scholars in Denmark decide to deposit their resources in the CLARIN-DK repository.

Tour de CLARIN Volume II publication now available

This second volume of Tour de CLARIN is organized into two parts. In Part 1, we present the seven CLARIN countries which have been featured since November 2018, when the first volume was published: Estonia, Latvia, Denmark, Italy, Slovenia, Hungary, and Bulgaria.

In Part 2, we present the work of the four Knowledge Centres that have been visited thus far: the Knowledge Centre for treebanking, the Knowledge Centre for the Languages of Sweden, the TalkBank Knowledge Centre, and the Czech Knowledge Centre for Corpus Linguistics.

Download now

Get in touch with CLARIN-DK

Questions or comments? Do you wish to join the community? Or do you want to get started with a workshop or a meeting?

Please get in touch at or with Costanza Navarretta, community lead in CLARIN-DK.

Read more

DIGITAL JOURNEYS: Helle’s case from the Digital Literacy course

New Paths, Old Sources: Cityscapes in the Danish Press, 1905-2005

Helle Strandgaard Jensen and Mikkel Thelle, both Associate Professors at Department of History and Classical Studies, Aarhus University, have studied the changing representation of cities in historical newspapers. By studying cityscapes in the Danish press, 1905-2005 they created a workflow which enables historians to do distant readings of newspapers. Participating in the Digital Literacy course has taught Helle more about the possibilities and limitations of digital methods.


Helle’s motivation for participating in the Digital Literacy course was twofold. She had previously worked with digital literacy from a historical perspective, looking at how concepts of ‘media literacy’ had changed in the second half of the twentieth century and now wanted to explore the phenomenon in relation to her own discipline. Secondly, Helle and her colleague Mikkel Thelle wanted to provide fellow historians with an approach to digital methods they could adapt to their own subfield: 

“We wanted to show other historians some of the advantages of using digital methods. We wanted to prove how explorative approaches can complement the research methods we traditionally use in our field.”

In History, digital methods can be used for overcoming several challenges. One is to provide an explorative approach to large data sets. Using distant reading methods, researchers can explore connections in large amounts of data without committing to specific research questions from the beginning. These connections can generate new types of questions for later close reading:

“Historians have always worked with a lot of different types of data which can be combined in different ways. Historians have used statistics, big data and computers in their work since the 1950s. But to have an explorative approach is important. Being able to examine connections in a large set of data generates questions that can complement other types of data.”

About the project

New Paths, Old Sources: Cityscapes in the Danish Press, 1905-2005

Helle Strandgaard Jensen is an Associate Professor of Contemporary Cultural History. Her research focuses on two areas: Media history and historians’ use of digital and analogue archives. The Digital Literacy project is made in collaboration with her colleague Mikkel Thelle and aims to show other historians how digital methods can be used:

“The question was if we were able to produce a workflow that could be adopted by historians fairly easily, provide them with an understanding of what digital methods can do, and finally allow them to work with the research questions they are used to working with.”


  • An IT supporter from the Digital Literacy course, Ross Deans Kristensen-McLachlan, developed a script that automatically aggregates the data Helle and Mikkel wanted to collect for their project. The script allowed the researchers to ask many kinds of questions about the representation of Danish cities in three big newspapers from 1905 to 2005. The results, in the form of changing cityscapes, can then be visualised on a heat map.


  • Query results from SMURF which collects data from Mediestream, the media collections of the Royal Danish Library.

New personal competences

Experience with digital methods

Knowing the limits and possibilities of digital methods.

Working with large data sets

Better understanding of which research questions can be asked with a large set of data.

Communication and teaching

Communicating and teaching digital methods to students and colleagues.

Next steps

There is a growing interest in digital methods at the Department of History and Classical Studies. A BA course has been established in which students are introduced to basic digital approaches. Having participated in the Digital Literacy course has also been helpful in this regard:

“I’ve gained more confidence when I now go back and teach my students in digital methods. What entry level do we need to have, and how do we make the challenges smaller for novices?”

The workflow that Helle and Mikkel created in the Digital Literacy project will be incorporated in teaching and will also be used for asking other types of research questions.

With her colleague Adela Sobotkova, Helle is the leader of Center for Digital History (CEDHAR). In this role, she will continue to work with digital methods in research and teaching.

Behind the researcher

Helle Strandgaard Jensen is an Associate Professor of Scandinavian cultural history at the Department of History and Classical Studies, School of Culture and Society, Aarhus University. 

She is also co-director of Center for Digital History Aarhus (CEDHAR).

Her research interests include e.g. contemporary media history in Scandinavia, Western Europe and the US after 1945.

Behind the Digital Literacy course

The Digital Literacy project is a competence development project organised by the Digital Arts Initiative at Aarhus University. It is a unique opportunity for researchers to qualify themselves in the digital area – with their own research questions as a point of departure.

Read more

Mapping the Danish web and Danish digital history

How the mapping of the Danish web happens

Through the supercomputer at The Royal Danish Library and newly developed algorithms, Professor Niels Brügger dives into the Danish part of the World Wide Web to map our digital history. Here he tells how.

Read more

DIGITAL JOURNEYS: Vladimir’s case from the Digital Literacy course

Powering large-scale reviews of energy security vs. social impact literature with topic modelling to locate cross-referencing between them

Vladimir Douglas Pacheco Cueva, Associate Professor of International Studies at Aarhus University, has embarked on a digital quest to expand his data sets to test if his analyses and hypotheses hold once scaled up. This case gives an insight into his digital journey through (and beyond) his participation in the Digital Literacy course at Arts, Aarhus University.

Read more

DIGITAL JOURNEYS: Kirstine and Anne’s case from the Digital Literacy course

Unveiling the character gallery of sermons: Labelling and social network analysis of 11,955 contemporary Danish sermons

Kirstine Helboe Johansen, Associate Professor in Practical Theology, and Anne Agersnap, PhD student in The Study of Religion, both Aarhus University, are interested in questions of how religion is actualised in contemporary society – and how such questions can be addressed digitally. This case gives an insight into their digital journey through (and beyond) their participation in the Digital Literacy course.

Read more

DIGITAL JOURNEYS: Janne’s case from the Digital Literacy course

Investigating the historical development of tracking and e-commerce technologies on the Danish Web

Janne Nielsen, assistant professor at the Department of Media and Journalism Studies at Aarhus University, is widening her digital horizon to face the concrete challenges of her everyday research. Her participation in the Digital Literacy course has whetted her digital appetite, and this case provides an insight into her digital journey through (and beyond) the course.

Read more

DIGITAL JOURNEYS: Anne’s case from the Digital Literacy course

Tracing Cold War perceptions of nuclear weapons in Denmark through distant (and close) reading

Anne Sørensen, history researcher at the School of Communication and Culture, Aarhus University, has embarked on a journey to expand her digital horizon – most recently by participating in the Digital Literacy course. This case gives insight into her digital journey through (and beyond) the course.

Read more

Experiments with Big Video

New technologies give enhanced methods for video ethnography

Researchers at Aalborg University have been experimenting with new technologies and enhanced methods for EMCA and video ethnography. One key focus has been to collect richer video and sound recordings in a variety of settings.

Read more

Gesta Danorum

Language technology, a shortcut to scientific evidence

This case is an example of how language technology can be exploited in research within the humanities. The resource that this case is based on is Gesta Danorum written about 1200 by the Danish historian, Saxo.

Read more

Locating missing metadata for radio programmes by using the programme schedules

Programmes that are part of a series often have the name of the series as their title. In this case, a search for the series title results in a list of programmes with the same title.

Read more


Research publications

Find examples of relevant publications from the DIGHUMLAB community.

No items found



Digital Humanities Lab Denmark

Aarhus University
Jens Chr. Skous Vej 4
DK-8000 Aarhus C