Commit 54de1b2e authored by Niels-Oliver Walkowski's avatar Niels-Oliver Walkowski
Browse files
parents 17cb37a5 83e292c3
......@@ -15,6 +15,20 @@
```
# Abstract
```bibtex
@incollection{liebl2022embed,
title = {“{{Embed}}, Embed! {{There}}’s Knocking at the Gate.” - {{Detecting Intertextuality}} with {{Embeddings}} and the {{Vectorian}}},
booktitle = {Fabrikation von {{Erkenntnis}}. {{Experimente}} in Den {{Digital Humanities}}},
author = {Liebl, Bernhard and Burghardt, Manuel},
date = {2022-01-15},
publisher = {{Melusina Press}},
location = {{Esch-sur-Alzette}},
url = {https://doi.org/10.26298/bj6r-td46},
isbn = {978-2-919815-25-8},
langid = {english}
}
```
*Bernhard Liebl & Manuel Burghardt, Computational Humanities Group, Leipzig University*
The detection of intertextual references in text corpora is a digital humanities topic that has gained a lot of attention in recent years. While intertextuality – from a literary studies perspective – describes the phenomenon of one text being present in another text, the computational problem at hand is the task of text similarity detection, and more concretely, semantic similarity detection. In this notebook, we introduce the Vectorian as a framework to build queries through word embeddings such as fastText and GloVe. We evaluate the influence of computing document similarity through alignments such as Waterman-Smith-Beyer and two variants of Word Mover’s Distance. We also investigate the performance of state-of-art sentence embeddings like Siamese BERT networks for the task - both as document embeddings and as contextual token embeddings. Overall, we find that Waterman-Smith-Beyer with fastText offers highly competitive performance. The notebook can also be used to upload new data for performing custom search queries.
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment