Tibetan Text Metrics

Compare Tibetan texts to discover how similar they are. This tool helps scholars identify shared passages, textual variations, and relationships between different versions of Tibetan manuscripts. Part of the TTM project.

Step 1: Upload Your Texts

Upload two or more Tibetan text files (.txt format). If your texts have chapters, separate them with the ༈ marker so the tool can compare chapter-by-chapter.

Tip: Files should be under 1MB for best performance. Use UTF-8 encoded .txt files.

Step 2: Choose Analysis Type

Pick a preset for quick results, or use Custom for full control.

What kind of analysis do you need?

Standard is recommended for most users. Deep analysis takes longer but finds texts with similar meaning even when words differ.

What each preset includes:

Preset Jaccard LCS Fuzzy Semantic AI
Standard
Deep
Quick

Results

Results Summary — Compare chapters across your texts

Results Summary — Compare chapters across your texts

Get Expert Insights

Let AI help you understand what the numbers mean and what patterns they reveal about your texts.

Understanding Your Results

After running the analysis, click "Explain My Results" to get a plain-language interpretation of what the similarity scores mean for your texts.

Visual Comparison

Vocabulary Overlap (Jaccard Similarity)

What it measures: How many unique words appear in both texts.

How to read it: A score of 70% means 70% of all unique words found in either text appear in both. Higher scores = more shared vocabulary.

What it tells you:

  • High scores (>70%): Texts use very similar vocabulary — possibly the same source or direct copying
  • Medium scores (40-70%): Texts share significant vocabulary — likely related topics or traditions
  • Low scores (<40%): Texts use different words — different sources or heavily edited versions

Good to know: This metric ignores word order and how often words repeat. It only asks "does this word appear in both texts?"

Tips:

  • Use the "Filter common words" option to focus on meaningful content words rather than grammatical particles.
  • Word mode is recommended for Jaccard. Syllable mode may inflate scores because common syllables (like ས, ར, ན) appear in many different words.

Metric progress will appear here during analysis