What Languages Do Pakistanis Speak? (With Pakistan Language Map)

One of the many frustrations I have faced when trying to understand South Asia is the near total lack of recent data on which languages are spoken and where. The lack of interest in South Asian languages is stunning, especially given that South Asia is home to some of the most spoken languages in the world. The language everyone has heard of is Hindi/Urdu (essentially one language with two scripts), which is spoken by over 300 million people, even if the closely related Rajasthani and Bihari languages are excluded. In the West though, awareness of the other South Asian languages is low. Just to give an idea of how large many of these languages are, here are some comparisons: as many people speak Punjabi as Japanese; roughly as many people speak Bengali as German, French, and Italian combined; as many people speak Oriya as Ukrainian; Pashto has as many speakers as Polish; Marathi, Telugu, and Tamil each has more than three times as many native speakers as Dutch. I have searched for up-to-date statistics for language in India, but haven’t been able to find anything more recent than the 1931 Census. I was able however, to track down tehsil-level date for Pakistan from the 1998 Census. In Pakistan, tehsils are the third level of administrative divisions, after provinces and districts. The data set I found isn’t perfect (more on that later), but it has most of what I was looking for. The data can be downloaded here, and the site also has a link to a pretty cool interactive map.

Before I post the map, I’m going to give a quick rundown on language in Pakistan. English and Urdu are the national languages, and are widely understood, at least by the educated. English, obviously, is nobody’s first language in Pakistan, and Urdu is the first language of about 7% of the population, mostly descendents of immigrants from north India who arrived in 1947. The most widely spoken tongue by far is Punjabi, which is the first language of slightly less than half the population. When Saraiki and Hindko, two Punjabi dialects that are sometimes classified as separate languages, are included, well over half of Pakistanis speak Punjabi or a closely related language. As anyone who read my post on the partition of Punjab will know, a large population of Punjabis (about 35 million) live across the border in India. The second most widely spoken language is Pashto, which unlike Punjabi, an Indo-Aryan language related to Hindi, is an Iranian language. This makes it a relative of Farsi and Kurdish, although Pashto’s closest relatives are a cluster of minor languages known as the Pamir languages which are spoken on the mountainous border between eastern Tajikistan and northeastern Afghanistan. Pashto, like Punjabi, is split between two countries. It is the dominant language in southern Afghanistan, but the majority of Pashtuns live in Pakistan. About 15% of Pakistanis speak Pashto as a first language. Right behind Pashto at 14% is Sindhi, which is a relative of Punjabi. There are a few million Sindhi speakers in India as well, some right on the opposite side of the border, and some Hindus who fled Sindh after Partition. The other major regional language is Balochi, spoken by by about 4%. Balochi, like Pashto, is an Iranian language, though it is not particularly closely related. It is actually closer to Kurdish, leading to the theory that the Baloch may have migrated to their current location fairly recently from the Middle East. Balochi is also spoken in southern Afghanistan and eastern Iran. There are some other minor languages, which I’ll discuss later, but those are the major languages. Below is the Pakistan language map.

Pakistan Language Map

Note that I colored the Saraiki and Hindko speaking areas shades of blue because it remains undetermined whether they are separate languages or dialects of Punjabi. Since I don’t speak any of these languages, I can’t make a determination for myself, so I split the difference by making them different shades of the same dark blue. I should also mention two other problem areas. One is the central Balochistan area, which is traditionally considered the Brahui zone. Brahui is a fascinating language. It is Dravidian, which means that it is related to the major South Indian languages, such as Tamil and Telugu, but it is spoken far away from the other Dravidian languages. Brahui barely registers in these data. There are several possibilities. One is that Brahui has lost ground to Baloch. Another is that the Brahui learn both Balochi and Brahui and are equally comfortable in both, leading most to identify the dominant Balochi language as their native tongue. According to some sources, the Brahui have a complicated system of code-switching in which people use Brahui in some situations and Balochi in other situations. Apparently, even within families, there are some times Balochi is used (elder son addressing father), and other times Brahui is used (younger son addressing father). The father speaks to the children in the language of the mother, and wives address their husbands in Balochi. This all seems crazy, but if true could explain why many Brahui would feel comfortable calling Balochi their native language. In any case, it seems that almost all Brahui are fluent in Balochi. Just as a side note, Ethnologue (and Wikipedia) say Brahui is spoken by four million people. This is a ludicrous number, implying that Balochistan, which has 7 million people, is majority Brahui-speaking.

The other problem area was the far north, including northern Khyber-Pakhtunkhwa, Gilgit-Baltistan, and Azad Kashmir. I couldn’t find Census data on Gilgit-Baltistan and Azad Kashmir (which combined make up Pakistan’s part of Jammu and Kashmir). As a result I had to look around the internet for information on these areas. In Azad Kashmir, I had to distinguish between Hindko and Pothohari, another Punjabi dialect being pushed as a separate language (included with Punjabi in this map). It’s a bit difficult to figure out where one begins and the other ends, but it seems that Hindko is spoken in Muzaffarabad, and south of that it is Pothohari. In Gilgit-Baltistan, I was able to use this survey from the early 1990s, which goes into some detail about the northern languages. The other problem is that the languages of northern K-P (Hindko, Khowar, and Kohistani) are all grouped under “other” in my data set. Luckily the geographic ranges of these languages are fairly well known and distinct, so it was easy to figure out which “other”-speaking areas belonged to which language.

I have already mentioned Hindko, but I’ll quickly go through the other six languages that show up on the map in the north. Three of the languages, the aforementioned Khowar and Kohistani, as well as Shina, are Dardic languages, the most northwestern branch of the Indo-Aryan language family. The Dardic languages form an arc in the far north of South Asia. To the southeast in Indian Kashmir, Kashmiri is the most spoken Dardic language. On the other side, in Afghanistan, Pashayi is spoken by perhaps half a million people south of Nuristan province. The other languages in the north are Burushaski (in brown), a language isolate with no known relatives, Wakhi (light purple) which is related to Pashto, and Balti (orange), which is related to Tibetan, and is spoken in Indian Kashmir, though the dialect there is called Ladakhi. The Baltis are almost exclusively Shia; the Ladakhis are split between Shia and Buddhist.

Hopefully this map underscores how linguistically diverse Pakistan is, and possibly explains why the country is so fragmented. Two other features worth noting are the huge swath of northern Balochistan that is Pashto speaking. The 1998 statistics put Pashto speakers at around 30% of Balochistan’s population, but with high birth rates and a surge of refugees from Afghanistan in the last decade, the Pashtun and Baloch populations in the province may be approaching parity. It is also worth noting the tiny presence of Urdu, the national language. While most educated people in Pakistan can speak Urdu, and almost everyone has at least a rudimentary knowledge of it, very few people speak it as a first language. Only the Sindhi cities of Hyderabad and Karachi are majority Urdu speaking. Hyderabad and Karachi were among the only significant Hindu-majority areas of British India that went with Pakistan, and it is possible that the Urdu speakers leaving India went there simply due to the availability of real estate once the Hindus left. Punjab would have been a more logical destination given Lahore’s traditional position as the most important city in northwest India, but Punjab was already overrun with Muslim refugees from India. Sindh wasn’t partitioned, which means it had to absorb fewer refugees. That might explain why the powerful Urdu-speaking community chose the cities of this arid backwater province as their new home.

This map also highlights two large movements for new provinces. The southern Saraiki-speaking Punjab has long had advocates for severing it from the north and creating a separate province centered on Multan. It is unclear how popular this demand is with the average citizen, but the movement has been active since the 1960s and shows no sign of going away. The other potential province would be in the non-Pashto speaking north of K-P. This province would be called Hazara and would be majority Hindkowan (the ethnic group that speaks Hindko).

The final interesting aspect of Pakistan’s linguistic mix is that the border between the Indo-Aryan languages of north India and the Iranian languages runs right through it. This fact, plus the detailed data set I found, gives us the unusual opportunity to investigate the boundary between two major language families. The Indo-Iranian languages form the largest branch of the Indo-European language family. It is typically split into the Iranian branch (Pashto, Farsi, Kurdish and others) and the Indo-Aryan branch (Hindi, Punjabi, Bengali, Marathi and many others). The Iranian and Indo-Aryan languages diverged about 4000 years ago. While South Asia and Iran share many cultural similarities, they are markedly different civilizations. Most of the Iranian peoples share a basic history and culture as do most Indo-Aryans. Below is the map of the border between the Iranian languages and the Indo-Aryan ones.

Pakistan Indo-Iranian

To me, there are two notable features of this map. The first is the intrusion of Indo-Aryans into central Balochistan. These people are a mix of Sindhi, Saraiki, and Punjabi, which explains why they didn’t register on the first map, since Balochi speakers remain the plurality. Added up though, several tehsils have an Indo-Aryan majority. That corridor between northern Sindh and Quetta is pretty important, because it connects Quetta, and ultimately  Kandahar, to the Pakistani heartland. It is also a major gas producing area for Pakistan. I wonder if the non-Baloch people there are workers who are employed in the gas fields and related industries. That area is also a hotspot for militancy. Perhaps Baloch militants strike there to get at the “foreign occupiers” who are stealing Balochistan’s resources (a common complaint of Balochistan’s active separatist movement).

The second, more macro, feature is the sharp line between the Indo-Aryan languages and the Iranian ones. There are very few parts of Pakistan with mixed communities. This is not at all what I expected. Given that all of these languages, except Urdu, are poorly standardized, I expected the distinctions between them to be hazy. Instead, we see many instances where a 95% Pashto district borders a 95% Punjabi district. This is fairly similar to Western Europe, where the language boundaries tend to be sharp. One doesn’t find many mixed German and Polish towns, or French and Italian. In Europe, most languages are highly standardized and the national boundaries were made to coincide with language borders often through ethnic cleansing. Neither of these is the case in Pakistan. I expected Pakistan’s language map to look a bit more like Southeast Asia’s.

Pakistanis (and Indians) do have very strong ethnic identities. Sindhi speakers know that they are Sindhi and care about the distinction with Balochis. The same is true of Punjabis and Pashtuns. The lack of ethno-linguistic mixing could explain why Pakistan has had such a hard time constructing a national identity. It also could be one of the reasons Pakistan has been so slow to react to the threat of radical Islamic militancy. The vast majority of terrorist attacks in Pakistan happen in Pashtun dominated areas. Since there are few Punjabis or Sindhis living near Pashtuns, those attacks are out of sight and out of mind for the majority of Pakistanis.

17 thoughts on “What Languages Do Pakistanis Speak? (With Pakistan Language Map)

    • Thanks for the comment. I used data from Pakistan’s 1998 Census for this map. I don’t have an agenda with languages, so if you could point me to better or more recent data, I would be happy to change the map. Until then, this is the best and most detailed data set I could find. Also, note that both Bahawalpur and Dera Ghazi Khan are in fact mostly Saraiki-speaking. In fact, according to the data I have, Dera Ghazi Khan district is about 80% Saraiki-speaking and Bahawalpur is 64% Saraiki-speaking; they just have pockets of non-Saraiki speakers. The non-Saraiki areas of these districts cover a large area, but they are sparsely populated compared to the Saraiki-speaking areas.

      Like

  1. “Sindhi, which is a relative of Punjabi” – not sure about this, will have to check with an expert on Sindhi.

    The blog fails to mention Persian which is primary language of Hazaras and other minority communities in KPK/Baluchistan.

    “Only the Sindhi cities of Hyderabad and Karachi are majority Urdu speaking.” – Again, this might have changed. Urdu is quite popular in urban capitals like Lahore and Islamabad at least in the urban middle class because of new private schools. I will not say its awami but a good chunk of people use it as primary language.

    I really like the second map – that’s how I have seen Pakistan after my trips to countries West of Pakistan. Apparently, both Baluchistan and Pakhtunkhwa have a very strong influence of Persian/Kurdish/Afghan (primarily Persian) culture and language for obvious reasons. Punjab and Sindh (primarily Punjab) has a very Indian feel to it. I always like to tell people how all neighboring countries enjoy black/green tea (without milk) and it is similarly the case in western region of Pakistan but the eastern vertical region enjoys the desi doodh wali chai. Colonial influence may be as the English like it with the milk, warna Iran, Afghanistan, China, Kashmir etc all like it without milk. Sorry, got off track…

    Like

    • Thanks for bringing up the Hazaras, though the reason Persian isn’t on the map is that Hazaras aren’t a majority anywhere in Pakistan apparently. Unfortunately, the most recent data I could find on Pakistan was the 1998 Census; I would be very interested to see how things have changed since then. Recently, I have seen a trend in the Pakistani press of people fretting that Urdu is being squeezed by the local languages as the language of everyday life and English as the language of the elite. I don’t know if that’s actually happening, it could just be the perception of a few newspaper editors. And of course you’re always welcome to opine about tea here…

      Like

  2. I would have to disagree with the notion that Urdu is the mother tongue of 7 percent of Pakistanis. In Urban areas from Peshawar to Abbotabad to Islamabad to Multan middle class families raise their children speaking Urdu (English if they are upper class). These children might claim to be Punjabi speakers or Hindko speakers but they study and work and read and write in Urdu and have often a pretty marginal knowledge of their mother tongue.

    The other point I wanted to address was the idea that there is little language mixing across the lnbguage families. Here I’m afraid your data set is hiding the many different sights of linguistic overlap.

    1) In Sindh the entire province has a presence of historically Baloch Sindhi speakers. Two or three generations ago they actually maintained the language. Benazir Bhutto, of Sindhi origin but very much not a Sindhi speaker, was married to Asif Zardari a Balochi-Sindhi.
    2) Northern Sindh also has a massive group of people who were/are Seraiki speakers. Similarly the entire Seraiki speaking belt is has pockets of Balochi settled who now speak Seraiki and also Punjabi speakers who settled the canal colonies.
    3) The entire Hindko speaking area should be thought of as a an area of overlap between Punjabi and Pashto. The language base is all very similar to Punjabi (and basically identicle to Seraiki) but there is both a large vocabulary and cultural context similar to Pashto culture). This makes sense as a fifth of Hindko speakers in the Hindko speaking areas self identify as Pathan. In many families Hindko and Pashto are both spoken.
    4) In the major city of the Pashto speaking area, Peshawar, Hindko was the language of trade and is still spoken in the old town as frequently as Pashto. Again mixed language use is the norm.
    5) As you mention in Balochistan the Dravidian Brahui and Iranian Balochi exist in a context of complete bilingualism.

    So in lived practice outside of Northern Punjab and the far North, Iranian and Indo-Aryan languages live in environments of bilingualism or at least biculturalism as a norm.

    Like

  3. My family is from Attock, Northern Pakistan I think. I grew up learning Hindko as a second language. I don’t know much about it or how prevalent it is among other Pakistanis, hopefully this article can shed some light.

    Like

  4. PERSIAN /FARSI should be the national language of Pakistan. Only then we can culturally seperate from the Indian subcontinent. We are not the same when will people understand that. If our national anthem can be persian why cant our language be persian. Persian has been the language of muslims in subcontinent before urdu was imposed on us.

    Like

  5. Farsi was the lingua franca of the vast Muslims world. For 8 hundred years it was the official language of Indian Muslim states. Lord Macauly banned Farsi to cut off relations of Indian Muslims with Afghanistan, Turkistan, Iran and Turkey and imposed Indian vernacular Hindi/Urdu which was one and the same language before British occupation. Only Persian words make Urdu different from Hindi. Moreover Urdu is not the mother tongue of any native ethnic group or region of Pakistan. It is the mother tongue of only 7% of immigrated from India. Because of Urdu we are stil considered culturally Indians. If we want to ged rid of Indian culture of Bollywood, we must revert to our Historical Language Farsi in which our fore fathers were educated. They used to learn, speak and write in Farsi instead of Urdu or any other Indian language.

    Like

  6. I have district level mother tongue data from Pakistan census 1998 if you are interested. There is no newer data yet as a new census planned for March 2016 was postponed few days ago. New time-schedule for the census is not announced yet and it is really a bad thing for such a fast growing country to miss a census for 18 years now.

    Like

  7. Saraiki was invented in 1962, when you say you don’t know the ‘language’, trust me you do. Nearly all classical Punjabi poets have written in Southern Dialect (except Waris Shah and MM Baksh) which ‘they’ (Balochs and Syeds) call Saraiki now. This is why Saraiki Nationalist claim Bulleh Shah and other poets as their own. Also, Hindko is again Punjabi dialect closely related to a dialect (Pothohari) spoken in Northern Punjab and Azad Kashmir. Azad Kashmir was part of Punjab till 1901 mind you. All the dominant tribes living in Punjab from Hazara to Rahimyar Khan are Jats, Rajputs and Arain expect Dera Ghazi Khan where Balochs are dominant but they have abandoned their language and speak Punjabi in Dervi or Derawali dialect. In Baluchistan Jatki dialect of Punjabi is spoken to besides Brahvi, literally meaning the language of Jats.
    Languages in Sindh are complicated, northern part is diverse (Punjabi and Sindhi). Not to forget the Baloch tribes living in Sindh who have forgotten Baluchi language and speak Sindhi. In the South Rajasthani languages and finally Urdu in Karachi.

    Like

  8. The divide of Indo-Aryan and Indo-Iranian language runs pretty much along the Indus river. The river separates people of Indic culture from people of Iranian culture. The Indic touch their elder’s feet, baloch and pashtuns don’t. One more thing, Pakistan has literally stolen Urdu from India, it’s not their language at all. In India we should just club the two languages with an official register. Instead of writing Hindi-Urdu, we should have one name for it, officially. I’m quite sure this would pi*s the Pakistanis.

    Like

  9. You want to claim that nearly 80% of the Indo-Aryan people should just switch to the language of their distant neighbors. Why? To be distant from South Asia? That is very pathetic, to say the least. Even afghans don’t accept Persian/Dari and mainly use their native languages throughout the country and their tie to Persian is much much closer. Urdu was not imposed, Persian was. Virtually no one spoke Persian in South Asia outside the Islamic Scholar community and the royal courts of the Mughals who adopted Persian and made it the official language. Just like how the British imposed English but no one spoke it but Persian was a lot weaker than English. So no one is actually a Persian speaker in the subcontinent, it’s been declining in South Asia since the British and is virtually dead today, in India and Pakistan.

    Like

What Do You Think?