The LANG Attribute / tag
- Posted by JC John Sese Cuneta (謝施洗) on 06.21.2009
- View Comments »
In my previous post I talked about “Baybayin - the Forgotten Pre-Hispanic Writing of the Filipino“. It was added in version 5.0 of the Unicode Standard together with Buhid, Hanunoo, and Tagbanwa as under the “Philippine Scripts” group.
But how should we properly write/mark our content written in another language and/or script?
For this post, I will talk about how to correctly declare the language of your content, this way you are being friendly with translation software and helper applications, and other technologies that rely on this often taken-for-granted HTML attribute (for example, I think it helps search engines to index your site and deliver it to the appropriate audience).
As you can see on the image, everyone can see the writing script used, but in the digital world there are people who do not have the fonts you are using. And there are people who do not use the same browser as you and me use (it could be a text browser, a speech browser, or as was mentioned a braille browser).
Update (2009-07-20): Corrected the “phi” examples; Added information regarding the difference between the “extended_language” and “variant” subtag positions - section: ISO-639-3 Languages.
Follow up:
When creating websites, it is important to properly declare the language being used by the webpage (so if you do not have it yet, you can learn it here now).
For example, I use the following for all my sites: <html lang="en-PH">. You should add xml:lang if you are using XHTML.
It is also important to declare the character set especially when you are going to use any characters beyond the scope of ASCII. I use this for all my sites: <meta charset="UTF-8" />. This is important but we won’t go into that for now.
Putting it all together:
Code:
<!DOCTYPE html> | |
<html lang="en-PH"> | |
<head> | |
<meta charset="UTF-8" /> | |
<meta description="My Website" /> | |
<meta keywords="Philippines, Baybayin" /> | |
<title>My Baybayin Website</title> | |
</head> | |
<body> | |
</body> | |
</html> |
Now let’s dig-in…
The lang attribute
The HTML lang attribute defines the language of the content enclosed within the element it was declared (eg. <span lang="fil">My Content</span>). The codes are called subtag, and for my Filipino readers there are only three subtag types you should worry about - language-region/country-script language-Script-REGION. (The full format: language-extended_language-Script-REGION-variant-extension-privateuse.) See the table below:
| Code | Language | Tag Placement |
|---|---|---|
| en | (Generic) English | language code |
| en-PH | Philippine English | language+Region/Country code |
| fil-Tglg | Filipino in Baybayin | language+Script code |
| bik-cts-Tglg | Bikolano of the Pandan (Northern Catanduanes) dialect in Baybayin script | language+extended_language+Script code |
| phi-Tglg-tsg | Tausug Philippine language written in Baybayin script | language+Script+variant |
If you want to find the subtags for a particular language, previously we have to check different websites and plenty of official code lists. A time consuming task (although normally you only have to do this once), right? Well, the latest official subtags can now be found in the IANA Language Subtag Registry. It is now the one universal source for all valid subtags.
So, according to this latest list, the subtags that are related to the Philippines are the following (if I missed anything, please leave a comment below)
Languages
Code:
Type: language | |
Subtag: tl | |
Description: Tagalog | |
Added: 2005-10-16 | |
Suppress-Script: Latn |
Code:
Type: language | |
Subtag: bik | |
Description: Bikol | |
Added: 2005-10-16 |
Code:
Type: language | |
Subtag: ceb | |
Description: Cebuano | |
Added: 2005-10-16 |
Code:
Type: language | |
Subtag: fil | |
Description: Filipino | |
Description: Pilipino | |
Added: 2005-10-16 |
Code:
Type: language | |
Subtag: hil | |
Description: Hiligaynon | |
Added: 2005-10-16 |
Code:
Type: language | |
Subtag: ilo | |
Description: Iloko | |
Added: 2005-10-16 |
Code:
Type: language | |
Subtag: pag | |
Description: Pangasinan | |
Added: 2005-10-16 |
Code:
Type: language | |
Subtag: pam | |
Description: Pampanga | |
Description: Kapampangan | |
Added: 2005-10-16 |
Code:
Type: language | |
Subtag: phi | |
Description: Philippine languages | |
Added: 2005-10-16 |
Code:
Type: language | |
Subtag: war | |
Description: Waray | |
Added: 2005-10-16 |
Region/Country
Code:
Type: region | |
Subtag: PH | |
Description: Philippines | |
Added: 2005-10-16 |
Script
Code:
Type: script | |
Subtag: Buhd | |
Description: Buhid | |
Added: 2005-10-16 |
Code:
Type: script | |
Subtag: Hano | |
Description: Hanunoo (Hanunóo) | |
Added: 2005-10-16 |
Code:
Type: script | |
Subtag: Tagb | |
Description: Tagbanwa | |
Added: 2005-10-16 |
Code:
Type: script | |
Subtag: Tglg | |
Description: Tagalog | |
Description: Baybayin | |
Description: Alibata | |
Added: 2005-10-16 |
Now that we have the subtags that we need, we can start writing the correct lang value for any Philippine language and script. See the list below:
lang="en-PH"- use this if you are a Filipino and/or you grew and learned English in the Philippines. Because more likely than not, you will be using English words that are exclusive to the Philippines, and that’s just one of the reason.lang="fil"- use this if you are writing in “Filipino” (not “Tagalog")lang="bik"- if writing in “Bikol”lang="ceb"- if writing in “Cebuano”lang="tl"- use this if you are writing in “Tagalog” (not “Filipino")lang="hil"- writing in “Hiligaynon”lang="ilo"- writing in “Iloko”lang="pag"- in “Pangasinan”lang="pam"- in “Kapampangan”lang="war"- “Waray”lang="phi"- use this if you are writing in another Philippine Language that has no corresponding ISO-639-2 code
Then if you want to write something in Baybayin script, you have to enclose it correctly with the script subtag “Tglg". Simply add it after the language subtag, like so:
lang="fil-Tglg"- use this if you are writing in Baybayinlang="bik-Tglg"- use this if you are writing in Bikol using Baybayinlang="ceb-Tglg"- use this if you are writing Cebuano using Baybayinlang="phi-Tglg"- use this if you are writing in Baybayin using another Philippine Language that has no ISO-639-2 subtag
Why do we have no lang="tl-Tglg"? Because of this Suppress-Script: Latn - if you check it above again. If I understood it correctly, it means that the Tagalog language as per the official standard should always be written in Latin script. If we use lang="tl-Tglg" it is an illegal language, and probably the application will either ignore it altogether or just drop the “Tglg” script subtag.
I may be wrong, it could also mean that there is no need to use lang="tl-Latn" because “Tagalog” is widely written in Latin since the 1900s. I will update this once I learn more. So continuing…
ISO-639-3 Languages
There’s another subtag that you should learn especially if you speak the Bikol macrolanguage. If you want to indicate a specific Bikol dialect simply add its ISO-639-3 code after the language subtag and before the script subtag. Here are examples:
lang="bik-bcl"- If you are writing in Central Bikolanolang="bik-bhk"- If you are writing in Albay Bikolano / Buhi-Daragalang="bik-bto-Tglg"- If you are writing in Iriga Bikolano using the Baybayin scriptlang="bik-cts-Tglg"- If you are writing in Pandan (Northern Catanduanes) using the Baybayin script
This is the extended-language subtag and sadly as of the time of this writing, it is still not implemented. Just check regularly the IANA Language Subtag Registry and search for your ISO-639-3 subtags (if it is a language without an ISO-639-2 code).
Next is if your language has an ISO-639-3 code and is under or part of the language code “phi” in ISO-639-2. The “phi” subtag is considered as a collective language (code), examples are:
lang="phi-krj"- for the Kinaray-a languagelang="phi-mdh"- for the Maguindanao languagelang="phi-Tglg-mrw"- for the Maranao language written in Baybayin scriptlang="phi-Tglg-tsg"- for the Tausug language written in Baybayin script
As you probably have noticed the format I used was “language-Script-variant” and not “language-extended_language-Script". My reasoning is simple - the “phi” language code is not really a language, it is accurately called a “collective” language entry in ISO-639-2 for all other Philippine languages not found in this version of the ISO language standard.
As compared to the “bik” language code, it was clearly marked as a “macrolanguage” in ISO-639-2 and ISO-639-3. And according to the W3C, dialects of macrolanguages are considered / should be written immediately after the language subtag.
In other words, if your ISO-639-2 code/subtag is considered a macrolanguage then you should use the “extended_language” subtag position when you want to define a particular dialect as in the case of lang="bik-cts-Tglg". And if it isn’t defined as a macrolanguage, then you should use the “variant” subtag position as in the case of lang="phi-Tglg-tsg".
Examples, examples, and examples…
Finally, more examples!
If your website is mainly about Iriga, then you should adjust your website’s header files accordingly:
Code:
<!DOCTYPE html> | |
<html lang="bik-bto"> | |
<head> | |
<meta charset="UTF-8" /> | |
<meta description="Ang Website Ko Sa Iriga Bikolano" /> | |
<meta keywords="Philippines, Baybayin, Iriga, Bikolano" /> | |
<title>Ang Website Ko Sa Iriga Bikolano</title> | |
</head> | |
<body> | |
</body> | |
</html> |
If you want to write “Happy Father’s Day” in Baybayin, simply do this:
<span lang="fil-Tglg">ᜋᜎᜒᜄᜌᜅ᜔ ᜀᜍᜏ᜔ ᜈᜅ᜔ ᜋᜅ ᜀᜋ</span>.
Or you can also do it like this: <span lang="fil-Tglg">ᜋᜎᜒᜄᜌᜅ᜔ ᜀᜍᜏ᜔ ᜈᜅ᜔ ᜋᜅ ᜀᜋ</span>
Both will give this result:
ᜋᜎᜒᜄᜌᜅ᜔ ᜀᜍᜏ᜔ ᜈᜅ᜔ ᜋᜅ ᜀᜋ
(if written letter-by-letter in Latin it says: Maligayang Araw nang manga Ama)
Simple? Coolness! Just remember that when writing language tags, keep it as simple and as short as possible. If you don’t have a need to be very specific like lang="bik-bcl" don’t be! Simply use lang="bik". This is especially true for blogs, if your blog is in Filipino language (not Tagalog!) then just put this in your code <html lang="fil">.
Only be specific when you need it or when you know that your site is catered mainly to that particular audience. Additionally, if you will use other languages and scripts, say in one of your blog post, simply enclose it in a span element just like my Baybayin example above.
You just have to remember, if you are going to speak in Filipino, use “fil"; in Tagalog, use “tl"; in Albay Bikolano, use “bik-bhk"; or in Kinaray-a, use “phi-krj". And if you want to write in Baybayin, simply add the script tag “Tglg"; or in Tagbanwa, “Tagb", and so on.
Easy? Yep it is. Go on, update your websites now and start practicing marking your content with the correct language and script.
Image source: Tsunami Warning Sign by Robert Sanzalone, licensed under CC By 2.0.
In General









