Open-source MIT License 5 languages

Topic modeling for
the humanities

Discover themes in your text corpus using LDA. No coding, no setup, no compromise on transparency.

lemmata.app
Language
Italian (it)
Number of Topics5
Chunk Size1000
Analysis complete
5 topics · C_v: 0.48
Overview
Topics
Topic Map
Heatmap
Distribution
Preprocessing
Export
Coherence (C_v): 0.5220 — Good
Coherence is solid. Review the topics qualitatively to confirm they make sense for your research question.
Topics
5
Perplexity
649.4
Log-likelihood
-143k
Documents
144
Document-Term Matrix
Vocabulary (total): 5,900
Vocabulary (kept): 1,094
Terms removed: 4,806
Top Lemmas (pre-LDA)
guido
634
ada
512
augusta
421
carla
347
parola
283
donna
241
padre
198

Built for researchers who work with texts, not terminals

No code required

Upload your corpus, configure parameters with sliders, and run LDA topic modeling entirely in your browser. Designed for literary scholars, historians, and digital humanists who want rigorous analysis without writing a single line of code.

Transparent preprocessing

Full preprocessing trace: see exactly which tokens were kept, removed, or lemmatized. Nothing is a black box.

5 languages

Built-in support for English, Italian, French, German, and Spanish with spaCy language models and per-language stopword lists.

Deterministic

Fixed random seed ensures identical results every time. Fully reproducible.

Complete export

CSV matrices, PNG/SVG charts, PDF report, and a full ZIP archive.

Three steps from corpus to insight

1

Upload

Drag and drop your text files (TXT, PDF, DOCX, ODT, EPUB) or paste text directly.

2

Configure

Choose your language, number of topics, POS filters, and stopwords. Smart defaults get you started fast.

3

Analyse

Explore interactive topic charts, heatmaps, distributions, and word clouds. Export results in one click.

Powered By

Built on trusted open-source tools

spaCy
NLP & lemmatization
scikit-learn
LDA modeling
Gensim
Coherence metrics
Streamlit
Interactive UI

Using Lemmata in your research?

@software{koran_lemmata_2026,
  author    = {Koran, Oğuz and Cangır, Hakan and Yücesan, Barış},
  title     = {Lemmata: A Multilingual {LDA} Topic Modeling Platform
               for the Humanities},
  year      = {2026},
  doi       = {10.5281/zenodo.19391730},
  url       = {https://lemmata.app},
  note      = {Software available at https://github.com/oguzkoran-max/lemmata}
}

DOI: 10.5281/zenodo.19391730

Ready to explore your corpus?

Free, open-source, no account required.

Launch Lemmata