A beech grove in Serbian. A derivative from the Russian “bukva”, meaning a letter, and a rather obvious pun on the English word "book".
It is also a way to better see through the forests of letters while keeping an eye on the trees they consist of. The result of a collaboration between a team of literary scholars and computer scientists, Bukvik is an online collaborative system and a multilingual tool for textual analysis designed with a focus on style and an adaptive approach to a researcher’s (or a group of researchers’) specific needs.
Figure 1: Bukvik Infographic
Here you can see the top level architecture of the Bukvik infrastructure:
Figure 2: Bukvik Architecture
Bukvik works with any corpus in several languages and its modular structure ensures its ability to integrate almost any new or improved textual analysis tool developed in any programming language. It requires no programming skills from the user to use the existing system with the features it already includes. The user is given full control over the course of each experiment (research flow) (Figure 3, and Figures 6) from the selection of the corpus or corpora to methods of analysis and through to visualization of results, without leaving the system.
Figure 3: Conceptual diagram of a research flow in Bukvik
In addition, Bukvik supports collaboration and may be used for teaching purposes through original features such as commenting on research results and an automatic visualization of the steps of each experiment flow that ensures transparency and reproducibility. Not only the results, but also the procedure may be shared and commented on by several users online in real time.
Bukvik’s purpose is to explore the growing potential of computational tools to not only determine the author of a given text, or to explore patterns in enormous corpora through distant reading methods, but also to complement traditional methods of literary analysis. Thus, it provides verifiable information that will guide and enhance close reading and comparison of large texts by focusing the scholar on salient patterns and providing a smooth transition between distant and close reading. Bukvik allows, further, for the comparison of texts written in different languages, with reference to balanced corpora in each language. Balanced corpora are currently available in English, Russian and French (subject to copyright holder's permission). We welcome contributions of balanced corpora in other languages as, in principle, any other language may be supported.
Figure 4.a: Example of SoW (Society of Words) diagram
Figure 4.b: Example (detailed, color 'blue') of SoW diagram
The basic notions behind the system are:
Figure 5.a: Example of POS analysis diagram
Figure 5.b: Example of etymology analysis diagram
The capacity of Bukvik to be used by people who are strangers to programming is ingrained into the system’s structure. The internal mini-language (Bukvik is an example of data-driven programming, and uses a domain-specific language (DSL)) is designed to make tasks faster and easier to accomplish. Users with programming experience will find that the system's modular structure makes it easy and intuitive to extend the system to additional pre-existing tools or develop and integrate new ones.
Developed as an online infrastructure for researchers and translators, Bukvik is a tool not only for individual research, but also for online collaboration. Users may, for example, upload texts and corpora, cache data, share experiments (research flows) and comment on results and on each others’ work both privately and publically. A visual data flow interface tracks data progress, visualizes results etc.
Figure 6.a: Bukvik sub-flow diagram of an research experiment
Figure 6.b: Bukvik flow diagram of an research experiment (using Figure 6.a sub-flow for analysing each of Nabokov's novels)
Possible uses of Bukvik are numerous. While it is primarily a system for literary analysis, it is been applied, for example, to comparative stylistic analysis of Wikipedia articles in different languages.
Bukvik is also used to support textual and cross-textual analysis and enrich the virtual world of books at LitTerra, a sister project.
Similarily, it has been used as a part of DemocracyFramework, as an analysis tool for research of democracy patterns in Wikipedia editing practices (especially in edit wars, 1, 2, and Wikipedia:List of controversial issues)
Another application is as a tool for translators who will then be able to compare their ongoing work with previous translations and trace the faithfulness of their translation in ways that have not been available before: semantic relations, part of speech ratios, etymological provenance of words, and (with the help of the network analysis part of our work) preserving links between “key words” that observably bear additional burden of meaning to that contained within a given sentence etc.
The system will become more and more powerful as technologies develop. At the moment, it already has the basic framework necessary to:
Tools already integrated into the system include:
Saša Mile Rudan (Oslo University) is completing his Ph.D. in Computer Science at Oslo University. He specializes in complex collaborative socio-technical systems (be it architecting and conducting research on them, leading transdisciplinary production teams, or taking an entrepreneur role), social processes, and knowledge management. His interest in literary analysis lies in bridging the socio-technical gap for scholarly research, in the Qualitatively Augmented Quantitative Analysis (QaQa) methodology, cross-lingual and cross-cultural comparison and in supporting under-resourced languages. These interests led to his involvement in projects: (LitTerra ; augmenting books, providing a literary-ecosystem with deep in-book and inter-book insights) and Bukvik, a research infrastructure for literary scholars. He is also a founding member of ChaOS (Cultural Humane Art of Science) Organization and CEO of HeadsWare.
Eugenia Kelbert (Higher School of Economics, Moscow) is an Assistant Professor of Comparative Literature and Philology at the Higher School of Economics in Moscow. Before that, she taught Russian literature at the University of Passau in Germany and was invited as guest researcher at Uppsala University and Stockholm University in 2017. Her dissertation, completed at Yale University, won the 2016 Charles Bernheimer Prize for best dissertation in Comparative Literature, awarded by the American Comparative Literature Association. Her current book project is based on her dissertation, and focuses on the phenomenon of literary translingualism in the 20th and 21st century; it includes extensive quantitative results on bilingual writers’ work in different languages. She has published on Joseph Brodsky, Rainer Maria Rilke and Eugene Jolas, among others, and is currently involved in collaborations in literary multilingualism, transnational creative writing and digital humanities; she is affiliated with the Multiling Center at Oslo University. Her other interests include translation and self-translation, comparative stylistics, poetry, and quantitative literary analysis.
Lazar Kovačević (BScEE, INVERUDIO, CEO) is an independent researcher with a focus on the application of IT technology to education, creativity, collaboration, social action, etc. He has done many projects in areas of (web) information retrieval systems, text analysis and natural language processing, machine learning, data mining, collaboration, etc. He enjoys participating in multidisciplinary environments and working on interdisciplinary solutions to real world problems. He co-authored several papers discussing creative features in time series ranging from physical and biological to physiological and psychological processes (i.e. healthy heart shows more creative features than unhealthy). He developed algorithms for increasing diversity of perspectives in search results.
Siniša Rudan (ChaOS (Cultural Humane Art of Science) Organization, Magic Wand Solutions, CEO) is an international speaker, educator, researcher, IT developer, and a poetry performer. His scientific research is in the domain of collaboration and creativity, applied on socio-technical and systems for social good/activism. By profession he is an IT M.Sc. He leads several art-science multidisciplinary projects. He was a participant, a mentor, and an organizer of several performance workshops and poetry courses. He’s performed his poetry across three continents and 15 countries. He pursues his interests through several regional and international positions: Co-founder of “ReMaking Tesla - Practices that make a Genius” - International Forum of Interactive and IT-Augmented Education; Co-founder of ChaOS - an NGO uniting artists and scientists on cultural and humane projects; Co-founder of Protopia Lab Serbia/Norway, and member of Protopia International Core Team; Project manager of “Poezin Slam Company” and performs as a one of its members.
Dr William Teahan, Computer Science, University of Wales (Bangor)
Toma Tasovac, DH, National Coordinator, DARIAH-RS Belgrade Center for Digital Humanities
Bukvik is still under development and is currently scheduled to be published freely online in 2018. It is, however, possible to use the system already under the developers' supervision and in collaboration with them. Please send inquiries or descriptions of projects to email@example.com.