Commit 0eecb8a5 authored by Niels-Oliver Walkowski's avatar Niels-Oliver Walkowski
Browse files

upd(vdhd): Add bibtex citation

parent f26904f4
# A Sentiment Analysis Tool Chain for 18<sup>th</sup> Century Periodicals
# Abstract
```bibtex
@incollection{koncar2022sentiment,
title = {A {{Sentiment Analysis Tool Chain}} for 18th {{Century Periodicals}}},
booktitle = {Fabrikation von {{Erkenntnis}}. {{Experimente}} in Den {{Digital Humanities}}},
author = {Koncar, Philipp and Geiger, Bernhard C. and Glatz, Christina and Hobisch, Elisabeth and Sarić, Sanja and Scholger, Martina and Völkl, Yvonne and Helic, Denis},
date = {2022-01-15},
publisher = {{Melusina Press}},
location = {{Esch-sur-Alzette}},
url = {https://doi.org/10.26298/ezpg-wk34},
isbn = {978-2-919815-25-8},
langid = {english}
}
```
## Abstract
Sentiment analysis is a common task in natural language processing (NLP) and aims for the automatic and computational identification of emotions, attitudes and opinions expressed in textual data.
While Sentiment analysis is typically tailored for and widely used in the context of Web data, the application to literary texts is still challenging due to the lack of methods dedicated to languages other than English and from earlier times.
With the work we present here, we not only introduce new sentiment dictionaries for French, Italian and Spanish periodicals of the 18<sup>th</sup> century, but also build a freely and publicly available tool chain based on Jupyter Notebooks, enabling researchers to apply our dictionary creation process and sentiment analysis methods to their own material and projects.
The proposed tool chain comprises two different parts: (i) the optional creation of sentiment dictionaries and (ii) the actual sentiment analysis.
# Contents
## Contents
This repository includes the following data:
......@@ -26,11 +40,11 @@ This repository includes the following data:
* _publication.ipynb:_ This Jupyter Notebook represents the main publication in which we explain the proposed method in detail and provide illustrative examples.
* _requirements.txt:_ This file lists all the necessary Python packages necessary to run the tool chain (typically used in conjunction with `pip`).
# Configuration
## Configuration
We recommend you to install [Anaconda](https://www.anaconda.com/), as it comes pre-bundled with most required packages.
## Python
### Python
The Jupyter Notebooks presented in this repository require Python 3.8.5.
If you simply want to use our dictionaries and Jupyter Notebooks to analyze sentiment of your texts, you need to have the following additional Python packages installed:
......@@ -55,7 +69,7 @@ In order to create dictionaries yourself, you need to have the following additio
Note that we tested our Jupyter Notebooks with the versions stated above.
While older and newer versions may work, the outcome may be impaired.
## Dataset
### Dataset
If you want to create dictionaries based on your own data, make sure that you have a decent amount of text.
The more text, the better the output.
Also, make sure that you cleaned your data and that each document is contained in a single *.txt* file with UTF-8 encoding.
......@@ -66,12 +80,12 @@ The annotated periodicals follow the XML-based Text Encoding Initiative (TEI) st
This dataset contains multiple languages, but we set our focus on French, Italian and Spanish, as these three languages have the largest collections.
For this purpose, we extracted texts from TEI encoded files into plain *.txt* files.
## Hardware
### Hardware
Please keep in mind that your machine needs adequate hardware depending on the amount of text you want to consider.
This is especially important for the dictionary creation tool chain (e.g., we used a machine with 24 cores and 750 GB RAM and computations still took up to three days).
If you just want to analyze sentiment using existing dictionaries, a computer with common hardware should suffice.
# Contributors
## Contributors
name: Philipp Koncar
orcid: 0000-0001-5492-0644
......@@ -121,7 +135,7 @@ institution: Institute of Interactive Systems and Data Science, Graz University
e-mail: dhelic@tugraz.at
address: Inffeldgasse 16c, 8010, Graz, Austria
# References
## References
TEI Consortium (2021-04-09), TEI P5: Guidelines for Electronic Text Encoding and Interchange. Version 4.2.2. http://www.tei-c.org/Guidelines/P5/. Accessed 7 May 2021.
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment