Silke Holmqvist, PhD student at for History and Classical Studies, Aarhus University, has used LARM.fm to map metadata for tv programmes.
Get inspired by examples of research performed by using the DIGHUMLAB infrastructure.
New technologies give enhanced methods for video ethnography The DIGHUMLAB community VILA supports research into embodied human interaction in a wide range of environments and with a focus on social...
Find examples of relevant publications from the DIGHUMLAB community.
Articles Jong, Franciska de, Maegaard, Bente, De Smedt, Koenraad, Fiser, Darja, Van Uytvanck, Dieter (2018). CLARIN: towards FAIR and responsible data science using language resources. LREC 2018, In: Proceedings of Eleventh...
Have, I. & Nielsen, J. (2020). LARM.fm user manual: A digital radio and TV research infrastructure for researchers, teachers and students. 3rd version. Aarhus: DIGHUMLAB. Michelsen, M., Have, I., Krogh, M., & Nielsen, S. K. (eds.). (2018). Tunes...
Scientific articles Brügger, N.: Digital Humanities in the 21st Century: Digital Material as a Driving Force. Digital Humanities Quarterly, 10(3), 2016. Brügger, N.: Humanities, Digital Humanities, Media studies, Internet studies : An...
Bernhard, J., Davidsen, J., Ryberg, T., Carstensen, A-K., & Abildgaard, J. R. (2018). Engineering students’ shared experiences and joint problem solving in collaborative learning. In SEFI 2018 conference proceedings. Davidsen,...
“Media in general, both radio and TV, are heavily under-prioritised as historical sources, even though they are central for our modern time. In a historical perspective, we know very little...
Get inspired by examples of research performed by using the DIGHUMLAB infrastructure.
This article is in Danish. An English translation is available here.
Silke Holmqvists ph.d.-projekt handler om forestillinger om gæstearbejderes følelsesliv, sådan som det vises og fortolkes gennem billeder, som cirkulerer i det offentlige rum – fx gennem medier – fra 1960’erne til 1980’erne.
”Især ’tyrken’, som han blev kaldt dengang i starten af 1970’erne, blev for eksempel fremstillet som dydig, hårdarbejdende og loyal. I starten af 1970’erne handler stereotypen om at han drikker ikke, går tidligt i seng og sparer på pengene til familien. I samtiden er danskerne begyndt at købe fjernsyn, skifte tapetet ud, købe nye gardiner og designersofaer til deres parcelhuse. Der opstår en idealisering af og forestilling om den dedikerede, sparsomme og beundringsværdige brune mand, der ankommer på togstationen for at dedikere en periode af sit liv til arbejde langt væk fra sin familie. Sidenhen i slut 70’erne og 1980’erne ser vi problematiseringen af gæstearbejdere i Danmark. Nu så man ham på fjernsyn for eksempel i sociale boligbyggerier. Han blev vist som uarbejdsom, patriarkalsk og på overførselsindkomst. Men altså det handler om den samme mand, og i hele perioden har alle mulige gæstearbejdere jo været lige så forskellige fra hinanden, som etniske danskere er,” forklarer Silke Holmqvist.
Inventing an immigrant – An emotional geography of guest worker images in Denmark c. 1960-1989
Uddrag for projektet: “I ask how media, including mass media but also the urban environment as a medium in itself, influenced the changing contours of the figure of the guest worker. The research design pays attention to the interaction between the emotional repertoires associated with or identified by visible minority workers and the urban places (material and fictional geographies) in which the guest worker was installed”.
EMPIRISK GRUNDLAG (et udpluk)
”Jeg startede mit ph.d.-forløb med at sidde og bladre alle fjernsynsprogramoversigter igennem fra tre årtier nede på Det Kgl. Bibliotek. Dengang kendte jeg ikke til LARM.fm’s digitalisering af programoversigterne – og det kunne have været fedt at kunne søge digitalt i stedet for at bladre,” fortæller Silke.
Silke har brugt LARM.fm til at tjekke programoversigter – dels de programmer, hun selv allerede havde fundet i de trykte programoversigter hos Det Kgl. Bibliotek, dels de DR-programmer, hun fik adgang til. Med LARM.fm har hun haft mulighed for at dobbelttjekke metadata på programmerne – data om, hvornår de er blevet sendt, har det været i primetime, hvornår er de blevet genudsendt, hvordan bliver programmerne beskrevet, osv.
Silke Holmqvist fortæller, at hun også har brugt LARM.fm i sin undervisning, hvor hun har opfordret sine studerende til at bruge det, fordi det er et oplagt sted at hente kildemateriale.
”Medier i det hele taget, både radio og TV, er så underprioriterede som historiske kilder, men de er så centrale for vores moderne tid. I et historisk perspektiv ved vi så lidt om, hvad der egentlig er foregået i primetime. Danskerne sad og sugede 1 time og nogle og tyve minutter til sig i gennemsnit hver dag i en tyveårig periode, men vi ved ikke særlig meget om hvad de kiggede på. Og det kan LARM.fm være med til at give et rigtig godt indblik i,” forklarer Silke.
”Jeg vidste, hvilke årtier jeg gerne ville lede i. Jeg har afgrænset dem med 5 års intervaller for at være sikker på, at jeg ikke fik uoverskueligt mange resultater. Så har jeg oprettet mine egne projekter, hvor jeg først har søgt på gæstearbejder, fremmedarbejder, arbejdsindvandring, tyrker, pakistaner, jugoslaver osv. Så har jeg lavet de samme søgninger på tværs af årtier.
I det store billede vidste jeg godt, hvad der skulle ligge der fra de fysiske programoversigter fra Det Kgl. Bibliotek, men nogle gange blev jeg overrasket over, at der lå et program, jeg ikke kendte til, eller en radioudsendelse, jeg kunne høre. Jeg er blevet positivt overrasket over, hvor mange radioprogrammer jeg har kunnet tilgå.”
Opret egne projekter, annotere, hop frem i lyden med 10 sek. fungerer rigtig godt.
Platformen er nem at tilgå og hurtigt at komme i gang med og har et brugervenligt interface.
Radio og TV er underprioriteret som kilder, men er centrale for den moderne periode
Ifølge Silke Holmqvist er LARM.fm et oplagt sted at hente kildemateriale, for kolleager og studerende, fordi det har meget at byde på:
”Jeg kan sagtens anbefale LARM.fm til andre, og jeg gør det hele tiden. Hvis en kollega sidder med et specifikt historisk emne, så er det virkelig sjældent, at man ikke finder noget i en LARM.fm-søgning – et radioprogram, der er perspektiverende i forhold til det, man arbejder med. LARM.fm er et rigtig godt sted at orientere sig bredt – om alt muligt, der er sket i vores fortid.”
Silke ser frem til at skrive sin afhandling færdig og forsvare den. Håbet er at få lov til at arbejdere videre med fjernsynsmediet og minoritetstematikken i en postdoc-stilling.
”Jeg vil blive ved med at have brug for, at LARM.fm-platformen er tilgængelig, for det, den har, kan man bare ikke finde andre steder,” slutter Silke.
Silke Holmqvist er ph.d.-studerende ved Historie og Klassiske Studier på Aarhus Universitet. Hun har en BA i idehistorie og en MA i kulturhistorie fra Aarhus Universitet.
Hendes forskningsinteresser omfatter moderne historie, visuel historie, kulturhistorie, minoritetsstudier og følelsesgeografi.
In April and May 2019, CLARIN-DK was featured in the CLARIN series “Tour de CLARIN” designed to highlight the many national consortia under CLARIN-EU.
Since 2016, the tour de CLARIN initiative has been periodically highlighting prominent user involvement activities in the CLARIN network in order to
Read the blog posts about CLARIN-DK below or download the full second volume of the “Tour de CLARIN” series.
Denmark has been a member of CLARIN ERIC since February 2012 and is one of its founding members. The Danish infrastructure CLARIN-DK was funded through two projects, the DK-CLARIN (2008-2010), and the DIGHUMLAB project (2011-2017). Since 2018, CLARIN-DK has …
Written by Costanza Navarreta, edited by Darja Fišer and Jakob Lenardič
Denmark has been a member of CLARIN ERIC since February 2012 and is one of its founding members. The Danish infrastructure CLARIN-DK was funded through two projects, the DK-CLARIN (2008-2010), and the DIGHUMLAB project (2011-2017). Since 2018, CLARIN-DK has been funded by the Faculty of Humanities and the Department of Nordic Studies and Linguistics, University of Copenhagen. The Danish national coordinator is Costanza Navarretta and the leading institution is the Centre for Language Technology, which is part of the Department of Nordic Studies and Linguistics.
CLARIN-DK involves the following institutions:
CLARIN-DK is a stable national research infrastructure where researchers can deposit, share and download language resources such as domain-specific corpora (e.g., The Danish Parliament Corpus 2009 – 2017 and the Johannes V. Jensen Corpus, which is a literary corpus collecting the works of the famous modernist poet Johannes Jensen from the early 20th century), as well as lexicons, word lists, speech transcriptions, and audio/video files in a secure way. CLARIN-DK also offers on-line language technology tools comprising e.g. a tokeniser, PoS tagger, a lemmatiser for Danish and English, a named entity recogniser for Danish, a keyword extractor, a TEI-to-text converter and a pipeline to linguistic annotation. Tools for performing basic frequency counts of words in textual data are also included as well as visualisation and corpus linguistics tools developed by other research groups, such as Korp and Voyant. Aside from being a certified B Centre, CLARIN-DK also runs a Knowledge Centre called DANSK, which provides expertise and help with using the language resources and technologies offered by the Danish consortium together withThe Danish Language Council.
CLARIN-DK is involved in various Danish research projects and networks. For example, it is part of the Danish collaboration initiative DIGHUMLAB that involves various research communities, such as NetLAB, which is aimed at the cross-disciplinary study of internet materials, and LARM.fm, which is an online platform used for automatically locating missing metadata of broadcast radio programmes. CLARIN-DK is also partner in an external funded research project Infrastrukturalisme with PI Henrik Jørgensen, Aarhus University. The consortium is also involved in a research network, Multimodal Child Language Acquisition, with the University of Hong Kong and The Chinese Hong Kong University, (PI Costanza Navarretta), and contributes tools and guidance in a number of research activities comprising the linguistic annotation of medieval documents and TEI encoding of literary corpora, mainly at the University of Copenhagen. CLARIN-DK is also involved in research data management and the promotion of FAIR data in the Humanities.
The CLARIN-DK team participates in the following CLARIN committees: Standing Committee for CLARIN Technical Centres (Lene Offersgaard, Bart Jongejan), Legal and Ethical Issues Committee: Sussi Olsen, Assessment Committee (Lene Offersgaard as Chair).
Lemmatizers generalize over the different forms of a word used in free text and provide its lemma, which is the base or dictionary look-up form. They are therefore one of the basic NLP tools which are not only important for NLP, but also for lexicographic work and all text-based studies…
Written by Bart Jongejan and Costanza Navarretta, edited by Darja Fišer and Jakob Lenardič
Lemmatizers generalize over the different forms of a word used in free text and provide its lemma, which is the base or dictionary look-up form. They are therefore one of the basic NLP tools which are not only important for NLP, but also for lexicographic work and all text-based studies. They are especially indispensable in morphologically rich languages that have a large number of word forms for the same lemma, which severely hinders querying or processing all of them in running text.
The CST lemmatizer has been developed over many years and as part of various projects, especially the Danish STO (Jongejan and Haltrup 2005) and the Nordic Tvärsök (Jongejan and Dalianis 2009). While it was initially used as a tool to support Danish lexicographic work, it has gradually been extended with a dynamic self-learning algorithm which learns new lemmatization rules from morphological lexica that contain the relations between word forms and their corresponding lemmas. The lemmatization rules are organized in a decision tree.
In comparison to other state-of-the-art stemmers and rule-based lemmatizers, the current version of the CST lemmatizer learns lemmatization rules not only from word endings, and recognizes a wide variety of derivational patterns; e.g., prefixation, infixation, suffixation. Therefore, it can deal with languages with different morphological systems. Currently, the CST lemmatizer has been trained on 25 languages. The list of these language-trained versions of the CST lemmatiser available from the Center for Language Technology is in Figure 1.
Danish and English texts can be lemmatized online with the CST lemmatizer. The lemmatizer is available for download via GITHUB. Figure 2 shows the CLARIN-DK web service for the CST-lemmatizer, while Figure 3 shows a Danish example sentence that was lemmatized with the tool.
Figure 3: Lemmatization of the Danish sentence Dog, året der er gået, kan også have budt på tunge stunder — ikke alt er glæde for os alle (“However, the past year can also have provided sad moments – not everything can give happiness to all of us ”), which is taken from the 2017 New Eve talk by the Danish Queen.
The CST lemmatizer trained for Danish has been used in many NLP projects, but also outside the NLP community. Frederik Hjorth, who is a political science researcher at the Department of Political Science, University of Copenhagen, has applied the CST lemmatizer to political speeches as one of the preprocessing steps in order to investigate how members of the existing political parties have addressed right-wing populists who have been challenging the order of the established political system (Hjorth 2018). The results of the study indicate that young politicians are often willing to engage with the populists as well as with other politicians across the political spectrum in name of democratic freedom (which Hjorth calls the strategy of engagement), while older politicians often describe the populist challengers as morally illegitimate (which Hjorth calls the strategy of disparagement) and refuse to discuss with them.
The CST lemmatizer was also used for many other languages in different linguistic projects. For example, it was trained on Russian (Sharoff and Nivre 2011) and then used e.g. for event identification (Solovyev and Ivanov 2016), and for anaphora and co-reference resolution (Toldova et al. 2014).
Jongejan, Bart and Dorte Haltrup. 2005. The CST Lemmatiser. Center for Sprogteknologi, University of Copenhagen version 2.7. http://cst.dk/online/lemmatiser/cstlemma.pdf
Jongejan, Bart and Hercules Dalianis. 2009. Automatic Training of Lemmatization Rules That Handle Morphological Changes in Pre-, in- and Suffixes Alike. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 – ACL-IJCNLP ’09. Vol. 1 Suntec, Singapore: Association for Computational Linguistics p. 145.
Frederik Hjorth. 2018. Establishment Responses to Populist Challenges: Evidence from Legislative Speech. 2018 Annual Meeting of the Danish Political Science Association. http://fghjorth.github.io/papers/responses.pdf
Sharoff, Serge and Joachim Nivre. 2011. The proper place of men and machines in language technology: Processing Russian without any linguistic knowledge. In Proc. Computational Linguistics and Intelligent Technologies DIALOGUE2011, Bekasovo, 591–604. https://pdfs.semanticscholar.org/36df/5fbe04f425e9b089437e979581d1f5375a94.pdf
Solovyev, Valery and Vladimir Ivanov. 2016. Knowledge-driven event extraction in Russian: corpus-based linguistic resources, Computational intelligence and neuroscience, 11 pages. https://doi.org/10.1155%2f2016%2f4183760
Nikolai Frederik Severin Grundtvig was a theologian, a priest, a philosopher, a poet, a writer, a teacher and a politician (member of the Rigsdagen, one of the two parts of the Parliament), who lived in Denmark between 1783 and 1872. He was contemporary with Hans Christian Andersen and Søren Kirkegaard …
Written by Dorte H. Hansen and Costanza Navarretta, edited by Darja Fišer and Jakob Lenardič
Nikolai Frederik Severin Grundtvig was a theologian, a priest, a philosopher, a poet, a writer, a teacher and a politician (member of the Rigsdagen, one of the two parts of the Parliament), who lived in Denmark between 1783 and 1872. He was contemporary with Hans Christian Andersen and Søren Kirkegaard. Grundtvig’s ideas have had a lasting impact on many areas of Danish culture like education, politics and the church. For example, Grundtvig advocated for a reform of the school system, which also included educating adults to participate actively in society and in the cultural life. Therefore, Grundtvig is considered to be the mind behind the folk high school. He was part of the national romantic movement, and contributed to the development of the Danish national awareness. Grundtvig’s written works are thus an important key to the understanding of Danish culture and mentality.
The collection Grundtvig’s Works are published by the Grundtvig Center at the University of Aarhus and will contain 1000 text critical and commented editions of the printed authorship by N.F.S. Grundtvig when finalized in 2030. The works are available to the public through a searchable interface including registers of persons, places and bible citations.
The researchers at the Grundtvig Center wanted to a reliable and consistent way to cite the publication and a sustainable and interoperable environment in which they could share the work among other scholars and the public in general. Since the Grundtvig Center itself does not offer the possibility for downloading the underlying files, CLARIN-DK was approached as a repository provider.
The corpus, now deposited in CLARIN-DK’s D-Space repository (http://hdl.handle.net/20.500.12115/31 ), consists of app. 1300 TEI encoded xml-files of which approximately 450 are critical editions manually annotated with person names, place names, mythological names, bible citations and comments. When new versions of the works are released, they will be uploaded as new versions of the corpus in the CLARIN-DK repository.
The language excerpt in Figure 3 shows the old orthography from before the Danish language revision in 1948, e.g.:
|Original||… som Man i det attende Aarhundrede troede, at Solen, efter Sigende, staaer stille istedenfor at staae op …|
|Normalised Danish||… som man troede i det 18. århundrede, at solen efter sigende står stille i stedet for at stå op …|
|Literal English translation||… as thought in the 1800th century, that the sun after what they said, is staying still instead of rising …|
Furthermore, the excerpt shows the manual mark-up of the corpus, done by philologists at the Grundtvig Center. There are references to e.g. person names (Joseph), mythological places (Midgaard) and actual places (Europe) and comments to parts of the text (Overhuggelse af Knuden , literal English translation: the cut of the knot). The actual comment is not shown in the text.
The corpus is an excellent resource for researchers who wish to apply digital methods to investigate various aspects of Grundtvig and his epoch. For example, researchers might want to investigate Grundtvig as a historical person, address the 19th century’s literature language or orthography, or dig into his work when studying the theoretic background of the Danish folk high school tradition. The corpus is also important for scholars applying Linked Data in order to investigate the 19th century since the corpus contains the annotations of people, places and events.
Digital methods are only slowly gaining ground in the teaching of literary studies in Denmark. While many lecturers are interested in introducing digital methods to their students, they often lack the knowledge of existing tools. From previous workshops, CLARIN-DK learned that neither …
Written by Lene Offersgaard and Dorte H. Hansen, edited by Darja Fišer and Jakob Lenardič
Digital methods are only slowly gaining ground in the teaching of literary studies in Denmark. While many lecturers are interested in introducing digital methods to their students, they often lack the knowledge of existing tools. From previous workshops, CLARIN-DK learned that neither traditional NLP tools like lemmatizers, POS-taggers, and named entity recognizers, nor simple command line scripting, were suitable in such teaching scenarios. This is why CLARIN-DK started to explore other technologies, such as data visualization tools that could serve as a better and easier entry point to the use of digital methodologies for non-computational researchers and teachers.
We opted for Voyant Tools, introduced to us by information specialists from HUMlab – a datalab at the Copenhagen University Library. Voyant Tools is an online environment that performs automatic text analysis with functionalities such as word frequency lists, frequency distribution plots, and KWIC displays (Figure 15). CLARIN-DK and HUMlab have organized several interactive workshops presenting the use of this environment to lecturers and researchers at the Faculty of Humanities at the University of Copenhagen. CLARIN-DK hosted a dedicated event at the Department of Nordic Studies and Linguistics on 21 November 2018, which was attended by 12 teachers and researchers.
In order to tailor the events to the needs of the participants, CLARIN-DK asked some of them in advance which literary works were most relevant to be showcased and which research questions could be investigated and discussed during the events. They opted for novels written around the Modern Breakthrough period, an era in the Scandinavian literature which started at the end of the 19th century and in which Naturalism replaced Romanticism. The Archive of Danish Literature (http://adl.dk) provided a collection of 54 novels. The novels were preprocessed and uploaded to a local instance of Voyant Tools by the CLARIN-DK team and information specialists from HUMlab.
A research question addressed the use of terms before and after the Modern Breakthrough (1870 – 1890). If it was possible to visualize changes in the use of, for example, terms for emotions (like love) which are typical for the Romanticism period compared to the use of more concrete terms (like work) which should be more common in the Naturalism novels. Using the Trends tool in Voyant (Figure 16), it was found that the term for love is used relatively more often before 1875 than after 1888. Moreover, the term for work is not used before 1875 in the novels, while it was used after then. Therefore, the use of these terms indicates that there is a shift in the use of common themes around the Modern Breakthrough. However, by using this simplistic method, it is impossible to differentiate novels representing the Modern Breakthrough.
We therefore investigated if other tools in Voyant could also confirm the differences between the two literary periods. In the ScatterPlot tool it is, among other things, possible to visualize the results of document similarity analysis. Figure 17 shows the document similarity using the TF-IDF frequency count for all novels in the corpus. In the figure, the novels by Herman Bang and a few novels by Sophus Schandorph are clearly separated from the other works. The novels from the late 19th century of these two writers are considered representatives of the Modern Breakthrough. It was now up to the researchers to interpret the similarities in the other groups of the scatter plot and from there to pose more research questions.
In this and other workshops, the participants soon realized that studying texts through isolated words (word forms) was limiting, and there was a clear need for lemmatization. Moreover, the need for PoS-tagged texts became evident since some researchers were interested in investigating adjectives showing emotions, while others were interested in analysing events, requiring the automatic extraction of verbs. Despite this, Voyant Tools has proved to be very illustrative and useful to get a first quantitative overview of a collection of novels, and it allowed the comparison of two or more novels.
As a follow up to this event, the CLARIN-DK team will organize a workshop introducing corpus tools and corpus querying techniques in linguistically annotated texts for Literary Studies. The event will also showcase how automatic linguistic annotations are performed on texts from before and after the Danish orthographic reform of 1948, and discuss how it is possible to circumvent problems encountered when applying NLP tools developed for contemporary texts to older texts.
I obtained my PhD from the University of Copenhagen in 2012 and my thesis was a combination of traditional literary theory and book history, a philological field that focuses on a more mechanical-analytical study of the publication process of literary works. I focused on Gittes monologer …
The interview was conducted via Skype by Jakob Lenardič.
1. What is your scholarly background and your current academic position?
I obtained my PhD from the University of Copenhagen in 2012 and my thesis was a combination of traditional literary theory and book history, a philological field that focuses on a more mechanical-analytical study of the publication process of literary works. I focused on Gittes monologer, a famous collection of satirical poems by the Danish poet Per Højholt published in different versions between 1980 and 1984. I was able to observe crucial textual differences between their various published versions, which allowed me to arrive at a much richer interpretation of the poems that wouldn’t be possible with the final, best-known 1984 version alone. This showed me how important it is to combine traditional qualitative literary analysis with analytical methods that also take into consideration non-textual information such as publication history.
I now work as chief editor at Grundtvig Study Centre, where we are preparing a critical edition of the collected works of N.F.S. Grundtvig, a very prolific and multidisciplinary Danish author who published around 37,000 pages of text from 1804 to his death in 1872. We are making this corpus available in an online environment, with manual annotations that follow the scholarly standards of textual criticism. In a sense, my PhD was an important methodological steppingstone for my current work related to the Grundtvig’s Works Corpus, which also involves a close study of the differences between the various published editions.
2. The Grundtvig’s work corpus has been published through the CLARIN-DK repository. How did this collaboration start? How do you benefit from this collaboration?
We released the first version of our corpus through the CLARIN-DK repository in 2018 at the suggestion of Lene Offersgaard, with whom we were collaborating on a related project at the time. This was a great opportunity for us because we had been receiving feedback from some of our more devoted users who said they wanted the corpus in a downloadable format. We’ve also made an agreement with CLARIN-DK that as soon as we publish a new version of the corpus through our online environment, we’ll also update the version deposited in the repository with the newest, more richly annotated one.
3. How is Grundtvig’s corpus structured? What are some of the challenges you come across when annotating the corpus?
The corpus is extremely varied in terms of content, since Grundtvig was a polihistorian who wrote on a variety of different subjects. Perhaps most prominently, he wrote books on Danish history and Nordic mythology, carried out linguistic studies of Old Icelandic and Old English, translated from Latin, wrote political and philosophical texts, and composed around 1,500 hymns, many of which are still sung today in Denmark. For this reason, Grundtvig’s views are representative of the intellectual and cultural zeitgeist of Denmark in the 19th century.
There’s a downside to his varied repertoire, in that annotation is still manually intensive. We do use a database for place and person names that we feed into a named-entity recognizer, but even in this case, we often have to manually verify the results. For example, Grundtvig often refers to the philosopher Søren Kierkegaard, who was a contemporary of his, and our software is generally successful in identifying this particular named entity. However, Grundtvig often refers to him by his last name only, but since Søren Kierkegaard had a brother who was also a published author in the same period, we have to manually check the automatic recognition to make sure that the software made a link to the correct referent. In addition to this, we often come across obsolete words, in which case we manually add their possible historical meaning. This can only be done by closely reading and interpreting the surrounding text. Nevertheless, we will use the parts that have already been annotated as a baseline for a semi-automated processing of the remaining two-thirds of the corpus in the future.
One of the greatest challenges in terms of mark-up pertains to identifying Biblical references, especially in cases where Grundtvig doesn’t use direct quotes taken from the Bible but his own modified variants, or where he makes indirect references to the more obscure motifs and quotes. Although we have theologians both internal and external who closely read the texts and manually identify such references, it would be invaluable if we could also make use of a language tool that would help automatize this process of identification. I don’t think that such a tool exists yet, but it would be a very welcome addition to the CLARIN infrastructure in my opinion. Similarly, it would be great to have a tool that can automatically recognize proverbs and sayings, which abound in Grundtvig’s works, given that his work is a major part of the Danish cultural heritage. Although I’m not an expert in digital technologies, it seems that developing such a tool wouldn’t be too hard a task, as there already exist readymade digital collections of Danish proverbs that could be used as a baseline for training the tool.
4. Has the corpus been successfully used by an external research project?
Yes, Baunvig and Nielbo (2017) have used our corpus in a case study to determine how digital methods can benefit the analysis of very large collections of written text, and uncover new perspectives and interpretations. Grundtvig Studies is a popular subfield in literary history in Denmark, and many studies on Grundtvig have been published in the past fifty years. However, previous researchers weren’t able to use digital methods and tools, which means that their claims were influenced by the limitations inherent to a purely manual approach to analysis. As I’ve said, Grundtvig produced around 37,000 pages in his lifetime, which is simply too much text for an individual researcher to read and then be able to recollect the finer details. For instance, there is an older study in which it is claimed that Grundtvig started suffering from a series of psychological problems in the 1830s, which was reflected in the texts he wrote in this decade. However, Baunvig and Nielbo (2017) were able to show, by using quantitative methods such as measuring the amount of information entropy in the corpus, that his psychological turmoil actually started earlier than was previously claimed, which is of course an important finding from a purely historical viewpoint. There has also been a follow-up study of our corpus conducted by Nielbo et al. (2018).
5. What makes this corpus particularly valuable for the CLARIN infrastructure?
I think that our rather thorough manual approach to the corpus is an important contribution for a more accurate understanding of the historical developments of the Danish language, especially its orthography. What is important in this respect is that there were no orthographic rules in Grundtvig’s time, only tendencies, which means that spelling was quite liberal in comparison to contemporary Danish. Consequently, we’re often in doubt whether the way Grundtvig spelled a certain word is an instance of spelling variation that was attested at the time or if it is just a spelling mistake on his part. This is particularly problematic in cases where Grundtvig’s idiosyncratic spelling can’t be found in the historical dictionaries of 19th century Danish, since this intuitively makes you think that the spelling variant was a mistake. However, such dictionaries weren’t compiled on the basis of the original edition but often used later published editions that had gone through the editing process, where spelling variation was normalized. This means that if a researcher wanted to study the vocabulary of 19th century Danish just on the basis of such dictionaries, he or she would miss the attested variations and consequently get a warped view of how people actually wrote at the time. By contrast, we spend a lot of time closely analysing and proofreading the materials, so we are able to present a resource that serves as a much more complex, as well as accurate, presentation of the linguistic situation at the time.
6. Could you give an example of such orthographic variation? How did you resolve it?
I actually came across a fairly interesting orthographic problem just recently when I was annotating Grundtvig’s History of the Northmen, which is one of the few texts he had written in English. In this text, Grundtvig used the word kempion in the sense of “champion” or “hero”; however, this spelling variant isn’t listed in the Oxford English Dictionary, which only includes the variant campion with an a instead of an e. Because my colleagues and I weren’t sure how to solve this issue, we consulted a Professor of Middle English, and he believed it to be a spelling mistake that should be corrected in the edited corpus, given that the Oxford English Dictionary is extremely comprehensive and thorough in its account of English etymology. However, when I searched for the variant kempion on Google, I found out that it was actually attested at the time, and it was for instance used by Sir Walter Scott in his 1822 novel The Pirate, which Grundtvig was alluding to.
7. Are there any other aspects of the CLARIN-DK infrastructure that are important for your work at the centre?
Yes, especially in relation to how proactively they reach out as part of their user-involvement initiative. Last year, CLARIN-DK organized a tutorial for the philologists at our centre where they demonstrated how Voyant tools can simplify our annotation process. Using Voyant has turned out to be extremely helpful when we come across obsolete phrases the meaning of which we don’t know and can’t find in the historical dictionaries. By using Voyant’s extended search capabilities and visualisation tools, we are now able to easily chart the occurrences of this unknown phrase in the entire corpus, and then extract only those texts where this phrase seems to occur in a similar context, which then helps us determine its actual meaning.
I am also pleased to say that CLARIN-DK has already made the first version of our corpus available through their installation of the Voyant tools. We plan on updating this test version with newer ones with regularity. In the long run, I believe the availability of the corpus through CLARIN-DK’s Voyant tools will significantly streamline user assistance.
8. Your professional website says that you’re also interested in audio literature. Is this something that you’re still actively researching?
No, my research on audio literature was mostly confined to my PhD project, because Per Højholt, who is the author of the poems that I was analysing, had read them aloud on Danish radio in the 1980s. By using an audio-analysis software called PRAAT, I measured prosodic features such as the author’s pitch and reading speed, and I was able to see how he deliberately changed his voice in accordance with the way the point-of-view character developed through the course of the poems’ narrative. This was a rather small but important finding since it hadn’t been previously acknowledged in the relevant literature on Gittes Monologer how the author’s spoken performance of his own work added new dimensions to the understanding of the poems themselves.
9. What kind of new research questions does audio literature offer in the context of Digital Humanities? Do you think that CLARIN could contribute to this field?
When I was writing my thesis, research on audio literature was still a very new field, but nowadays it is more readily agreed upon that audio recordings can serve as crucial material for textual analysis. Literary theorists are now conducting important research on the link between the reader of the audio text and the content of the text itself, and this opens up many interesting questions. Let’s say, for instance, that we are dealing with a novel written in the first person, and that the narrator is a woman. Should the reader of the audio version then also be a woman, or conversely, what interpretative repercussions would arise if the reader were actually a man? That is, the person’s voice crucially affects the way people perceive the text, much in the same way that the sort of typography of an old book can evoke various pre-conceptions in the reader about the book’s content.
Given how audio literature opens up interesting questions relevant for the emerging digital humanities, I think that new digital tools for analysing recorded literary works would serve as very welcomes additions to the CLARIN infrastructure.
10. What are your hopes for CLARIN-DK in the future?
I think that one of the future challenges for Digital Humanities in Denmark is to find a common platform where our whole research community can have a more unified and interoperable access to as many carefully annotated resources as possible. I believe that CLARIN-DK is an excellent candidate in the country for this, because our experience with releasing the Grundtvig’s Work corpus has proven to us that their repository is a stable environment through which corpora can be released in a sustainable fashion and with well-presented metadata. On top of that, the repository also allows us to integrate our corpora with other services in the consortium. For this reason, it can only be a good thing if more digital humanities scholars in Denmark decide to deposit their resources in the CLARIN-DK repository.
This second volume of Tour de CLARIN is organized into two parts. In Part 1, we present the seven CLARIN countries which have been featured since November 2018, when the first volume was published: Estonia, Latvia, Denmark, Italy, Slovenia, Hungary, and Bulgaria.
In Part 2, we present the work of the four Knowledge Centres that have been visited thus far: the Knowledge Centre for treebanking, the Knowledge Centre for the Languages of Sweden, the TalkBank Knowledge Centre, and the Czech Knowledge Centre for Corpus Linguistics.
Helle’s motivation for participating in the Digital Literacy course was twofold. She had previously worked with digital literacy from a historical perspective, looking at how concepts of ‘media literacy’ had changed in the second half of the twentieth century and now wanted to explore the phenomenon in relation to her own discipline. Secondly, Helle and her colleague Mikkel Thelle wanted to provide fellow historians with an approach to digital methods they could adapt to their own subfield:
“We wanted to show other historians some of the advantages of using digital methods. We wanted to prove how explorative approaches can complement the research methods we traditionally use in our field.”
In History, digital methods can be used for overcoming several challenges. One is to provide an explorative approach to large data sets. Using distant reading methods, researchers can explore connections in large amounts of data without committing to specific research questions from the beginning. These connections can generate new types of questions for later close reading:
“Historians have always worked with a lot of different types of data which can be combined in different ways. Historians have used statistics, big data and computers in their work since the 1950s. But to have an explorative approach is important. Being able to examine connections in a large set of data generates questions that can complement other types of data.”
New Paths, Old Sources: Cityscapes in the Danish Press, 1905-2005
Helle Strandgaard Jensen is an Associate Professor of Contemporary Cultural History. Her research focuses on two areas: Media history and historians’ use of digital and analogue archives. The Digital Literacy project is made in collaboration with her colleague Mikkel Thelle and aims to show other historians how digital methods can be used:
“The question was if we were able to produce a workflow that could be adopted by historians fairly easily, provide them with an understanding of what digital methods can do, and finally allow them to work with the research questions they are used to working with.”
Knowing the limits and possibilities of digital methods.
Better understanding of which research questions can be asked with a large set of data.
Communicating and teaching digital methods to students and colleagues.
There is a growing interest in digital methods at the Department of History and Classical Studies. A BA course has been established in which students are introduced to basic digital approaches. Having participated in the Digital Literacy course has also been helpful in this regard:
“I’ve gained more confidence when I now go back and teach my students in digital methods. What entry level do we need to have, and how do we make the challenges smaller for novices?”
The workflow that Helle and Mikkel created in the Digital Literacy project will be incorporated in teaching and will also be used for asking other types of research questions.
Helle Strandgaard Jensen is an Associate Professor of Scandinavian cultural history at the Department of History and Classical Studies, School of Culture and Society, Aarhus University.
She is also co-director of Center for Digital History Aarhus (CEDHAR).
Her research interests include e.g. contemporary media history in Scandinavia, Western Europe and the US after 1945.
The Digital Literacy project is a competence development project organised by the Digital Arts Initiative at Aarhus University. It is a unique opportunity for researchers to qualify themselves in the digital area – with their own research questions as a point of departure.
How the mapping of the Danish web happens
Through the supercomputer at The Royal Danish Library and newly developed algorithms, Professor Niels Brügger dives into the Danish part of the World Wide Web to map our digital history. Here he tells how.
Powering large-scale reviews of energy security vs. social impact literature with topic modelling to locate cross-referencing between them
Vladimir Douglas Pacheco Cueva, Associate Professor of International Studies at Aarhus University, has embarked on a digital quest to expand his data sets to test if his analyses and hypotheses hold once scaled up. This case gives an insight into his digital journey through (and beyond) his participation in the Digital Literacy course at Arts, Aarhus University.
Unveiling the character gallery of sermons: Labelling and social network analysis of 11,955 contemporary Danish sermons
Kirstine Helboe Johansen, Associate Professor in Practical Theology, and Anne Agersnap, PhD student in The Study of Religion, both Aarhus University, are interested in questions of how religion is actualised in contemporary society – and how such questions can be addressed digitally. This case gives an insight into their digital journey through (and beyond) their participation in the Digital Literacy course.
Investigating the historical development of tracking and e-commerce technologies on the Danish Web
Janne Nielsen, assistant professor at the Department of Media and Journalism Studies at Aarhus University, is widening her digital horizon to face the concrete challenges of her everyday research. Her participation in the Digital Literacy course has whetted her digital appetite, and this case provides an insight into her digital journey through (and beyond) the course.
Tracing Cold War perceptions of nuclear weapons in Denmark through distant (and close) reading
Anne Sørensen, history researcher at the School of Communication and Culture, Aarhus University, has embarked on a journey to expand her digital horizon – most recently by participating in the Digital Literacy course. This case gives insight into her digital journey through (and beyond) the course.
New technologies give enhanced methods for video ethnography
Researchers at Aalborg University have been experimenting with new technologies and enhanced methods for EMCA and video ethnography. One key focus has been to collect richer video and sound recordings in a variety of settings.
Language technology, a shortcut to scientific evidence
This case is an example of how language technology can be exploited in research within the humanities. The resource that this case is based on is Gesta Danorum written about 1200 by the Danish historian, Saxo.
Find examples of relevant publications from the DIGHUMLAB community.
Andersen, J. S., Thøgersen, J., Larsen, B.: Larm Audio Research Archive – en infrastruktur til forskning og undervisning i radio og lyd 2010-2014
Granly, E., Stougaard, B. og Have, I. (red.), Sound Archives, særnummer for tidsskriftet SoundEffects Vol 5, No 2.
Kreutzfeldt, Jacob: ”State Controled Avantgarde?: Emil Bønnelyckes radiophonic city portrait of Copenhagen”. A Cultural History of the Avant-Garde in the Nordic Countries. ed. / Tania Ørum; Per Stounbjerg et al. Vol. 2 Edition Rodopi B.V, 2015.
Brügger, N., Schroeder, R (editors): The Web as History. UCL Press. 2017
Brügger, N.: Digital Humanities in the 21st Century: Digital Material as a Driving Force. Digital Humanities Quarterly, 10(3), 2016.
Brügger, N.: Humanities, Digital Humanities, Media studies, Internet studies : An inaugural lecture. Center for Internetforskning, Aarhus Universitet, 2015. 16 s. (Skrifter fra Center for Internetforskning)
Davidsen, J. & Kjær, M. (ed.): Introduktion til videoanalyse, Samfundslitteratur, 2018. p. 13-35
Davidsen, J., & McIlvenny, P (2017). Research on Language and Social interaction – Blog
Davidsen, J. & Ryberg, T. (2017): “This is the size of one meter”: Children’s bodily-material collaboration”. Intern. J. Comput.-Support. Collab. Learn (2017) 12: 65. doi:10.1007/s11412-017-9248-8