Beginning HTML with CSS and XHTML

Modern Guide and Reference

by David Schultz and Craig Cook

With a foreword by Simon Collison

Technical review by Gez Lemon

Declaring Your Content’s Natural Language

Skip to the language codes

The web is a globe-spanning, borderless nation that speaks many languages — all of them, in fact. A web page created by an American, hosted in America, targeted at an American audience, can still just as easily be seen by people in Malaysia, Argentina, Finland, Turkey, and even Canada. If your particular slice of the web will mostly be seen by speakers of a particular language, you should still make some consideration for speakers of other languages by declaring the base language of your document.

Declaring the natural language of your content will assist user-agents in parsing and rendering it. Search engines can automatically filter their results based on language, returning a listing of pages written in the language specified by the searcher. Screen readers can alter their pronunciation so German sounds like German and Tagalog sounds like Tagalog (in theory, anyway).

You should declare the primary language of your entire document by including the lang and xml:lang attributes in the document’s root html element. You can then differentiate individual phrases or passages written in another language by adding the attributes to their appropriate parent element; lang and xml:lang can be validly attached to almost any element.

The lang attribute comes from HTML, while xml:lang is the XML equivalent to be used in XHTML documents. However, because you can’t reliably serve XHTML as XML (since not all browsers correctly support the application/xml+xhtml MIME type), even XHTML documents are treated as HTML (with a text/html MIME type). This means the xml:lang attribute alone won’t work in documents served as HTML. And yet, you should still strive for XML compliance in your XHTML markup; an XHTML document should be well-formed XML even if it’s not being served as such. To ensure full compatibility, both the lang and xml:lang attributes should be included, with identical values, like so:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

The two-letter abbreviated language code “en” indicates that this document is written in English. To be even more specific, I’m writing in American English (as opposed to King’s, Canadian, or Australian English), and I can declare that specific dialect by extending the language code with a hyphenated regional subcode thusly:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-us" lang="en-us">

Many common dialects of major languages have standardized subcodes, such as en-us for American English, en-gb for British English, en-ca for Canadian English, fr-ca for Canadian French, and fr-be for Belgian French, to name just a few. In the cases of such national dialects, the language code typically takes the form language-country. (note, however, that British English is en-gb, not en-uk; the latter subcode — though perfectly logical — is not correct). Dialects from narrower regions can be declared as language-country-dialect. However, you should only make such dialectic distinctions when it’s necessary; declaring the base language is usually sufficient.

Language Codes

These language codes are an official standard, ISO 639. Like most web standards, ISO 639 has changed and evolved over time, and continues to do so. The original version (639-1) included codes for 138 languages, covering most of the common languages in the world. That still doesn’t come close to encompassing the full breadth of humanity, with languages numbering in the thousands. The standard was expanded in the early 1990s, introducing three-letter codes to allow a greater number of permutations.

Below are the 138 standardized, two-letter abbreviated language codes of ISO 639-1. The latest, exhaustive listing can be found online at the IANA Language Subtag Registry, in a practically unreadable format. Thankfully, W3C Internationalization Activity Lead Richard Ishida has cooked up a handy search utility.

Language NameCode
A
Abkhazianab
Afan (Oromo)om
Afaraa
Afrikaansaf
Albaniansq
Amharicam
Arabicar
Armenianar
Assamesehy
Aymaraay
Azerbaijaniaz
B
Bashkirba
Basqueeu
Bengali/Banglabn
Bhutanidz
Biharibh
Bislamabi
Bretonbr
Bulgarianbg
Burmesemy
Byelorussianbe
C
Cambodiankm
Catalanca
Chinesezh
Corsicanco
Croatianhr
Czechcs
D
Danishda
Dutchnl
E
Englishen
Esperantoeo
Estonianet
F
Faroesefo
Fijifj
Finnishfi
Frenchfr
Frisianfy
G
Galiciangl
Georgianka
Germande
Greekel
Greenlandickl
Guaranign
Gujaratigu
H
Hausaha
Hebrewhe
Hindihi
Hungarianhu
I
Icelandicis
Indonesianid
Interlinguaia
Interlingueie
Inuktitutiu
Inupiakik
Irishga
Italianit
J
Japaneseja
Javanesejv
K
Kannadakn
Kashmiriks
Kazakhkk
Kinyarwandarw
Kirghizky
Kurundirn
Koreanko
Kurdishku
L
Laothianlo
Latinla
Latvian/Lettishlv
Lingalaln
Lithuanianlt
M
Macedonianmk
Malagasymg
Malayms
Malayalamml
Maltesemt
Maorimi
Marathimr
Moldavianmo
Mongolianmn
N
Nauruna
Nepaline
Norwegianno
O
Occitanoc
Oriyaor
P
Pashto/Pushtops
Persian (Farsi)fa
Polishpl
Portuguesept
Punjabipa
Q
Quechuaqu
R
Rhaeto-Romancerm
Romanianro
Russianru
S
Samoansm
Sanghosg
Sanskritsa
Scots Gaelicgd
Serbiansr
Serbo-Croatiansh
Setswanast
Shonatn
Sindhisn
Siswatiss
Slovaksk
Sloveniansl
Somaliso
Spanishes
Sundanesesu
Swahilisw
Swedishsv
Singhalesesi
T
Tagalogtl
Tajikta
Tatartt
Telugute
Thaith
Tibetanbo
Tigrinyati
Tongato
Tsongats
Turkishtr
Turkmentk
Twitw
U
Uigurug
Ukrainianuk
Urduur
Uzbekuz
V
Vietnamesevi
Volapukvo
W
Welshcy
Wolofwo
X
Xhosaxh
Volapukvo
Y
Yiddishyi
Yorubayo
Z
Zhuangza
Zuluzu

Further Reading

Order from Amazon

Apress