Gesta Danorum

Language technology, a shortcut to scientific evidence

This case is an example of how language technology can be exploited in research within the humanities. The resource that this case is based on is Gesta Danorum written about 1200 by the Danish historian, Saxo. Gesta Danorum is written in High Latin and describes in 16 books the period of time from King Dan to Canute VI of Denmark. Traditionally, the work is divided into two main sections, one consisting of books 1-9 which deals with Norse mythology and a historical second part of the books 10-16 describing the introduction of Christianity in Denmark. In 1969, a competing thesis was launched cf. Skovgaard-Petersen (1969).

In this analysis the composition of Gesta Danorum is split up into books 1-8 and books 9-16. These two competing interpretations can be paraphrased into the question: Is it book 9 or book 10 that represents the transition from the heathen to the Christian period in Gesta Danorum? In order to find evidence for the answer to this question, the platform with embedded linguistic information and advanced search facilities was exploited to identify subject area specific elements in the various books of Gesta Danorum and to display the search results in a manageable way.

The procedure

The procedure was to take a translation of Gesta Danorum, compute PoS and lemma information automatically. To give example of the outcome of the automatic processing: The sentence“Kongen blev kronet på slottet” (“the king was crowned at the castle”) is represented as follows (The overall structure is word(form)/lemma/PoS tag):

Kongen/konge/NN_COM_SING_DEF blev/blive/V_INDIC_PAST kronet/krone/V_PARTC_PAST på/på/PREP slottet/slot/NN_NEUT_SING_DEF

The next step was to upload the annotated version of Gesta Danorum into a search platform. This platform made it possible to make queries that exploit both the linguistic information and the Corpus Query Processing (CQP) search facilities.

Keywords: compositional and literary analysis, language technology, digital humanities, advanced search platform,   POS-tagging.


Locating missing metadata for radio programmes by using the programme schedules

Programmes that are part of a series often have the name of the series as their title. In this case, a search for the series title results in a list of programmes with the same title. Depending on the number of programmes in the series, this list can be very long, and it can be a challenge to distinguish between the different programmes in the results list. This is even more true in the cases where the information about date and time is not correct or missing. In that case adding metadata to the materials might help you keep track of the different materials. It also enriches the material thereby making it easier for other users to find and work with the material. One way of finding additional metadata is to use the programme schedules. The programme schedules are a very useful source of metadata, including information about when and where a programme was broadcasted and possibly also the subject of the programme, who participated in the programme and the like. It also means that it is possible to find metadata – and thus information – about a programme even if the programme itself is not in

The following example will describe the steps taken to enrich a programme from the series “Søndagsuniversitetet” (The Sunday University) from 1959. All names and descriptions are in Danish as they appear in the archive. The method described here might not be applicable in all cases because it depends on the availability of the programme schedule. All programme schedules should be available from 1925 up to and including 1983 but there might be individual schedules that are not available in the archive.

1) Search for Søndagsuniversitetet without using any of the filters. This will return 60 results, of which 28 are radio programmes and 32 are programme schedules. Narrow the result to only radio programmes by clicking the filter Radioprogrammer under Type. As you can see 13 of the radio programmes are called SØNDAGSUNIVERSITETET followed by just a season (sæson) number, while 15 of them have an additional title, for instance SØNDAGSUNIVERSITETET. 1. SÆSON: Udenrigshandel og produktion. The ones that have an additional title are materials where metadata has been added by a user in order to enrich the materials and to be able to tell the different programmes apart.

2) Choose the radio programme SØNDAGSUNIVERSITETET. 1. SÆSON: Udenrigshandel og produktion by clicking on it in the list of results. In the userinterface (if you are logged in) you will now see a timeline where you can listen to the radio programme and below you see the Radio archive metadata. The metadata includes the title and the time and date of the broadcast but the time and date 01/01/1900 00:00 of course is wrong. This is a date that is assigned to materials when the correct information about the time of the broadcast is missing.

3) Next to the Radio Archive metadata is another tab called Radio LARM metadata. Open this tab. This is the tab where users can add metadata to the material and also edit the existing LARM metadata (not the Archive metadata, these are fixed). You have to log in to to be able to edit metadata (log in is in the upper right corner). When you are logged in a pencil symbol is show in the rows where you can add/edit metadata. As you can see information has already been added.

4) Listen to the radio programme. At the very beginning the title, subject and host of the programme is introduced. This can then be added to the metadata. It is then possible to search for one or more of the words mentioned in the introduction. By using the words from the radio programme as search words it is possible to locate the programme schedule mentioning the broadcast of the programme. It is worth mentioning that not all programmes have such a thorough introduction, so in some cases it will require more listening to try to figure out what programme it is and what to search for in the programme schedules. But in this case, try for instance to search for udenrigshandel AND ølgaard (part of the subtitle of the individual programme and the name of the host)  – AND is a boolean operation, cf. the manual. This search results in two programme schedules that can be studied in order to find information about the programme. You can watch a pdf with a digital version of the original document and because the document has been scanned and processed with optical character recognition software (OCR), the data from the programme schedule are also available below the pdf in the form of archive metadata which are searchable. Both files have metadata about the series of programmes and the programme schedule from 01/03/1959 mentions the specific programme Danmarks økonomi og udlandet. Udenrigshandel og produktion.

5) Now that the programme has been located in the programme schedule the date and time of the broadcast can be added to the LARM metadata. It is, of course, also possible to add additional relevant metadata to the material. Another option is to add annotations to the material, and this will be described in xxx.

Publication: LARM Audio Research Archive

The publication is about the research, the activities, and the technology behind and about the archives, which are available online in an improved and updated version.

The platform has been through extensive changes through 2015 and a new manual, tutorials and workshops are being developed and offered in 2016.

The publication (in Danish only) can be found here:

Language-based Materials and Tools

Experimental labs

