Why do some languages feel weirdly familiar?
Maps spread across a table (Photo: andrewtneel, via Unsplash)
If you have ever started learning a language and felt an immediate sense of familiarity, you can probably relate to this feeling. When scaling the mountain of learning a new language, some languages seem to give you a few footholds straight away. Words look familiar, sentence structure seems intuitive, and even the rhythm of a conversation feels natural. Other languages do the opposite - even the simplest phrases can feel like you are trying to grab onto something smooth.
Most language enthusiasts are aware of the concept of linguistic closeness and historical language families. Language pairs like Spanish and Portuguese, or Danish and Swedish, feel very close to one another. A pair like Japanese and Arabic can feel much more distant to many English-speaking learners. But beyond knowing that languages from the same family naturally share more vocabulary and grammar, one of my questions was whether there were languages that are not especially close on paper but still feel surprisingly familiar in practice. What does it even mean to say one language feels closer than another?
From the lens of historical linguistics, language families tell us where languages came from, how they developed - they represent a map of how languages relate to each other historically. But ask a different question: which language will feel familiar when you start reading it, hearing it, or seeing it in subtitles? That is less a question about ancestry and more a question about present-day usage. Languages shift constantly. Shared media, borrowed slang, cultural contact, and just the passage of time all push languages in directions their family tree could not predict.
I created a subtitle-based language similarity map using subtitle translations from films and television. The main idea was to analyse how similar languages behave when translating the same meanings. What happens if we draw a different map, not from ancestry, but from the choices languages make when they express the same lines? One example of this is the phrasing languages naturally reach for. Take the phrase "I miss you": in French, Italian and Spanish, you essentially say "you are missing from me" - the subject flips. In French: tu me manques, in Italian: mi manchi, in Spanish: me haces falta. This is the kind of recurring pattern that can feed into the map.
To build it, I used subtitle translation data from OpenSubtitles, via OPUS, a large open dataset of subtitle translations, across 20 languages. I looked at how each one tended to translate the same English words and phrases, then compared those patterns across languages. From that came two visualisations, both interactive tools on the SubSmith site. A heatmap, which arranges all 20 languages in a grid so you can see at a glance which pairs are most similar, and a network graph, which connects languages like a web and shows only the strongest links so the clusters are easier to spot. The heatmap is useful because it shows not just familiar clusters but the full spread of distance across all 20 languages. You can see where similarity is most concentrated, how tightly certain families pull together, and how far languages from different families tend to sit from one another.
A heatmap of practical similarity across 20 languages. If language families give us one map of closeness, this is an attempt at another: one based on how similarly languages translate the same subtitle material.
The map recovered a lot of the structure you would expect. Danish and Swedish sat very close together, and Spanish and Portuguese were strongly linked. French and Italian clustered closely. Across the network as a whole, links within the same language family were stronger on average than links across families. That suggests everyday translation still carries something of those deeper relationships.
There were cases where a language appeared closer to one outside its family than you might expect at first glance. Finnish and Swedish are one case - on a traditional family tree they sit in completely different branches, Uralic and Germanic. Yet they are neighbours, and centuries of Swedish as Finland's administrative language have left real marks on everyday usage. The map picks that up even though the family tree does not. This is a good example of how a language can sit near another not because of shared ancestry, but because of something shared in practice: similar habits of expression, similar translation choices, or similar ways of handling the kinds of dialogue that recur in films and television. Others, like Hebrew and Swedish, are harder to explain. No obvious shared ancestry or contact history.
A network view of the same data, where strong clusters are easier to spot and the places where this second map loosens away from the family tree stand out more clearly.
This kind of analysis can only tell you so much. The uncertainty around pairs like Hebrew and Swedish points to a real limitation in the method. Every comparison uses English as a common reference point. A more direct approach would compare languages without that go-between, but consistent subtitle data across all 190 pairs simply does not exist at the same scale. That means some things English does not express clearly can get flattened or missed altogether. Take Japanese: the word for "eat" changes form depending on who you are speaking to. Taberu (食べる) in casual conversation, tabemasu (食べます) in polite speech. English just says "eat", so that distinction drops out. Turkish creates a different problem. Evlerinizden breaks down as ev (house) + ler (plural) + iniz (your) + den (from). Four grammatical layers in one word. The map only sees one word, so that whole structure gets missed.
In the end, it comes down to which lens you want to use; if you want to know where languages came from, family trees are the right tool. If you want to know which language might feel more familiar in subtitles, dialogue, or everyday media, this kind of map can show you those differences. That is the core answer to why some languages feel closer than others: closeness is not only historical, but also practical, shaped by how languages are used and translated in everyday contexts.
That is what interested me most in the end. I started with a feeling many language learners already have: some languages just seem closer. Those footholds do exist. But the way we notice them, distinguish them, and use them is not limited to traditional language families.
All images belong to Ibrahim Farah, unless otherwise stated.