Julian Schröter, Keli Du, Julia Dudar, Cora Rok und Christof Schöch

From Keyness to Distinctiveness – Triangulation and Evaluation in Computational Literary Studies


Full-length article in: JLT 15/1-2 (2021), 81–108.

There is a set of statistical measures developed mostly in corpus and computational linguistics and information retrieval, known as keyness measures, which are generally expected to detect textual features that account for differences between two texts or groups of texts. These measures are based on the frequency, distribution, or dispersion of words (or other features). Searching for relevant differences or similarities between two text groups is also an activity that is characteristic of traditional literary studies, whenever two authors, two periods in the work of one author, two historical periods or two literary genres are to be compared. Therefore, applying quantitative procedures in order to search for differences seems to be promising in the field of computational literary studies as it allows to analyze large corpora and to base historical hypotheses on differences between authors, genres and periods on larger empirical evidence. However, applying quantitative procedures in order to answer questions relevant to literary studies in many cases raises methodological problems, which have been discussed on a more general level in the context of integrating or triangulating quantitative and qualitative methods in mixed methods research of the social sciences. This paper aims to solve these methodological issues concretely for the concept of distinctiveness and thus to lay the methodological foundation permitting to operationalize quantitative procedures in order to use them not only as rough exploratory tools, but in a hermeneutically meaningful way for research in literary studies.

Based on a structural definition of potential candidate measures for analyzing distinctiveness in the first section, we offer a systematic description of the issue of integrating quantitative procedures into a hermeneutically meaningful understanding of distinctiveness by distinguishing its epistemological from the methodological perspective. The second section develops a systematic strategy to solve the methodological side of this issue based on a critical reconstruction of the widespread non-integrative strategy in research on keyness measures that can be traced back to Rudolf Carnap’s model of explication. We demonstrate that it is, in the first instance, mandatory to gain a comprehensive qualitative understanding of the actual task. We show that Carnap’s model of explication suffers from a shortcoming that consists in ignoring the need for a systematic comparison of what he calls the explicatum and the explicandum. Only if there is a method of systematic comparison, the next task, namely that of evaluation can be addressed, which verifies whether the output of a quantitative procedure corresponds to the qualitative expectation that must be clarified in advance. We claim that evaluation is necessary for integrating quantitative procedures to a qualitative understanding of distinctiveness. Our reconstruction shows that both steps are usually skipped in empirical research on keyness measures that are the most important point of reference for the development of a measure of distinctiveness. Evaluation, which in turn requires thorough explication and conceptual clarification, needs to be employed to verify this relation.

In the third section we offer a qualitative clarification of the concept of distinctiveness by spanning a three-dimensional conceptual space. This flexible framework takes into account that there is no single and proper concept of distinctiveness but rather a field of possible meanings depending on research interest, theoretical framework, and access to the perceptibility or salience of textual features. Therefore, we shall, instead of stipulating any narrow and strict definition, take into account that each of these aspects – interest, theoretical framework, and access to perceptibility – represents one dimension of the heuristic space of possible uses of the concept of distinctiveness.

The fourth section discusses two possible strategies of operationalization and evaluation that we consider to be complementary to the previously provided clarification, and that complete the task of establishing a candidate measure successfully as a measure of distinctiveness in a qualitatively ambitious sense. We demonstrate that two different general strategies are worth considering, depending on the respective notion of distinctiveness and the interest as elaborated in the third section. If the interest is merely taxonomic, classification tasks based on multi-class supervised machine learning are sufficient. If the interest is aesthetic, more complex and intricate evaluation strategies are required, which have to rely on a thorough conceptual clarification of the concept of distinctiveness, in particular on the idea of salience or perceptibility. The challenge here is to correlate perceivable complex features of texts such as plot, theme (aboutness), style, form, or roles and constellation of fictional characters with the unperceived frequency and distribution of word features that are calculated by candidate measures of distinctiveness. Existing research did not clarify, so far, how to correlate such complex features with individual word features.

The paper concludes with a general reflection on the possibility of mixed methods research for computational literary studies in terms of explanatory power and exploratory use. As our strategy of combining explication and evaluation shows, integration should be understood as a strategy of combining two different perspectives on the object area: in our evaluation scenarios, that of empirical reader response and that of a specific quantitative procedure. This does not imply that measures of distinctiveness, which proved to reach explanatory power in one qualitative aspect, should be supposed to be successful in all fields of research. As long as evaluation is omitted, candidate measures of distinctiveness lack explanatory power and are limited to exploratory use. In contrast with a skepticism that has sometimes been expressed from literary scholars with regard to the relevance of computational literary studies on proper issues of the humanities, we believe that integrating computational methods into hermeneutic literary studies can be achieved in a way that reaches higher explanatory power than the usual exploratory use of keyness measures, but it can only be achieved individually for concrete tasks and not once and for all based on a general theoretical demonstration.


Baker, Paul, Querying keywords. Questions of Difference, Frequency and Sense in Keyword Analysis, Journal of English Linguistics 32:4 (2004), 346–359.

Blei, David M., Probabilistic Topic Models, Communications of the ACM 55:4 (2012), 77–84.

Bondi, Marina, Perspectives on Keywords and Keyness. An Introduction, in: Marina Bondi/Mike Scott (eds.), Keyness in Texts, Amsterdam/Philadelphia 2010, 1–18.

Bruza, P.D. et al., Aboutness from a Commonsense Perspective, Journal of the American Society for Information Science 51:12 (2000), 1090–1105.

Burrows, John, ›Delta‹: a Measure of Stylistic Difference and a Guide to Likely Authorship, Literary and Linguistic Computing 17:3 (2002), 267–287.

Burrows, John, All the Way Through: Testing for Authorship in Different Frequency Strata, Literary and Linguistic Computing 22:1 (2007), 27–47.

Carnap, Rudolf, Logical Foundations of ProbabilityChicago/London/Toronto 1950.

Da, Nan Z., The Computational Case against Computational Literary Studies, Critical Inquiry 45 (2019), 601–639.

Duncker, Axel, Gattungssystematiken, in: Rüdiger Zymner (ed.), Handbuch Gattungstheorie, Stuttgart/Weimar 2010, 12–15.

Egbert, Jesse/Doug Biber, Incorporating Text Dispersion into Keyword Analyses, Corpora 14:1 (2019), 77–104.

Firth, John Rupert, The Technique of Semantics, Transactions of the Philological Society (1935), 36–72.

Fish, Steven, Are Muslims Distinctive? A look at the evidence, Oxford 2011.

Fishelov, David, Genre Theory and Family Resemblance – Revisited, Poetics 20:2 (1991), 123–138.

Føllesdal, Dagfinn, Hermeneutics and the hypothetico-deductive method, Dialectica 33:3–4 (1979), 319–336.

Fricke, Harald, Norm und Abweichung, München 1981.

Fricke, Harald, Definitionen und Begriffsformen, in: Rüdiger Zymner (ed.), Handbuch Gattungstheorie, Stuttgart/Weimar 2010, 7–10.

Gabrielatos, Costas, Keyness Analysis: Nature, Metrics and Techniques, in: Charlotte Taylor/Anna Marchi (eds.), Corpus Approaches to Discourse. A critical review, Oxford 2018, 225–258.

Greene, Jennifer C., Is Mixed Methods Social Inquiry a Distinctive Methodology?, Journal of Mixed Methods Research 2:1 (2008), 7–22.

Gries, Stephan Th., Dispersions and Adjusted Frequencies in Corpora, International Journal of Corpus Linguistics 13:4 (2008), 403–437.

Gymnich, Marion/Birgit Neumann/Ansgar Nünning (eds.), Gattungstheorie und Gattungsgeschichte, Trier 2007.

Hempfer, Klaus W., Zum begrifflichen Status der Gattungsbegriffe: Von ›Klassen‹ zu ›Familienähnlichkeiten‹ und ›Prototypen‹, Zeitschrift für französische Sprache und Literatur 120:1 (2010), 14–32.

Herrmann, Berenike J./Karina van Dalen-Oskam/Christof Schöch, Revisiting Style, a Key Concept in Literary Studies, Journal of Literary Theory 9:1 (2015), 25–52.

Jauß, Hans Robert, Literaturgeschichte als Provokation der Literaturwissenschaft, Konstanz 1967.

Kelle, Udo, Die Integration qualitativer und quantitativer Forschung – theoretische Grundlagen von »Mixed Methods«, Kölner Zeitschrift für Soziologie und Sozialpsychologie 69:2 (2017), 39–61.

Kilgarriff, Adam, Comparing Corpora, International Journal of Corpus Linguistics 6:1 (2001), 97–133.

Klimek, Sonja/Ralph Müller, Vergleich als Methode? Zur Empirisierung eines philologischen Verfahrens im Zeitalter der Digital Humanities, Journal of Literary Theory 9:1 (2015), 53–78.

Lamping, Dieter, Handbuch der literarischen Gattungen, Stuttgart 2009.

Lijfijt, Jefrey et al., Significance Testing of Word Frequencies in Corpora, Digital Scholarship in the Humanities (2014), 1–24.

Lincoln, Yvonna S./Egon G. Guba, Paradigmatic Controversies, Contradictions, and Emerging Confluences, Revisited, in: Norman Denzin/Yvonna S. Lincoln (eds.), Handbook of Qualitative Research, Thousand Oaks, CA 52018, 108–150.

Maron, M.E., On Indexing, Retrieval and the Meaning of About, Journal of the American Society for Information Science, 28:1 (1977), 38–43.

Müller, Ralph, Kategorisieren, in: Rüdiger Zymner (ed.), Handbuch Gattungstheorie, Stuttgart/Weimar 2010, 21–23.

Paquot, Magali/Yves Bestgen, Distinctive Words in Academic Writing: A Comparison of three Statistical Tests for Keyword Extraction, DIAL – Digital Access to Libraries,   https://dial.uclouvain.be/pr/boreal/object/boreal:76052 (17.09.2021), 1–23 (originally published in Language and Computers 68 [2009], 247–269).

Rácz, Péter, Salience in Sociolinguistics. A Quantitative Approach, Berlin/Boston 2013.

Ryan, Marie L., Introduction: On the Why, What, and How of Generic Taxonomy, Poetics 10:2–3 (1981), 109–126.

Ryle, Gilbert, About, Analysis 1:1 (1933), 10–12.

Schmidt-Hidding, Wolfgang, Zur Methode wortvergleichender und wortgeschichtlicher Studien, in: Europäische Schlüsselwörter, Vol. I: Humor und Witz, ed. by Sprachwissenschaftlichen Colloquium (Bonn), München 1963, 18–33.

Schröter, Julian, Gattungsgeschichte und ihr Gattungsbegriff am Beispiel der Novellen, Journal of Literary Theory 13:2 (2019), 227–257.

Scott, Mike, PC Analysis of Key Words – and Key Key Words, System 25:1 (1997), 1–13.

Scott, Mike, WordSmith Tools Manual. Version 3.0, Oxford 1998.

Šklovskij, Viktor, Die Kunst als Verfahren [1917], in: Jurij Striedter (ed.), Russischer Formalismus, München 1969, 5–35.

Stamatatos, Efstathios, A Survey of Modern Authorship Attribution Methods, Journal of the American Society for Information Science and Technology 60:3 (2009), 538–556.

Strube, Werner, Sprachanalytisch-philosophische Typologie literaturwissenschaftlicher Begriffe, in: Christian Wagenknecht (ed.), Zur Terminologie der Literaturwissenschaft, Stuttgart 1989, 35–49.

Stubbs, Michael, Three Concepts of Keywords, in: Marina Bondi/Mike Scott (eds.), Keyness in Texts. Corpus Linguistic Investigations, Amsterdam/Philadelphia 2010, 21–42.

Swales, John, Genre Analysis. English in Academic and Research Setting, Cambridge 1990.

Toolan, Michael, The Theory and Philosophy of Stylistics, in: Peter Stockwell/Sara Whiteley (eds.), Handbook of Stylistics, Cambridge 2014, 13–31.

Tukey, John W., Exploratory Data Analysis, London et al. 1977.

Underwood, Ted, Distant Horizons. Digital Evidence and Literary Change, Chicago 2019.

Voßkamp, Wilhelm, Gattungen als literarisch-soziale Institutionen, in: Walter Hinck/Alexander von Bormann (eds.), Textsortenlehre – Gattungsgeschichte, Heidelberg 1977, 27–44.

Walton, Kendall L., Categories of Art, Philosophical Review 79:3 (1970), 334–367.

Warren, Martin, Identifying Aboutgrams in Engineering Texts, in: Marina Bondi/Mike Scott (eds.), Keyness in Texts, Amsterdam/Philadelphia 2010, 113–126.

Williams, Raymond, Keywords. A Vocabulary of Culture and Society [1976], revised edition, New York 1983.


JLTonline ISSN 1862-8990

Copyright © by the author. All rights reserved.
This work may be copied for non-profit educational use if proper credit is given to the author and JLTonline.
For other permission, please contact JLTonline.

How to cite this item:

Abstract of: Julian Schröter, Keli Du, Julia Dudar, Cora Rok und Christof Schöch, From Keyness to Distinctiveness – Triangulation and Evaluation in Computational Literary Studies.

In: JLTonline (14.01.2022)

URL: http://www.jltonline.de/index.php/articles/article/view/1127/2589

A Persistent Identifier can be found in the PDF-Version of this article.