Internationalization and localization
The ECMAScript Internationalization API is a forthcoming standard that helps automagically localise the output of dates, numbers, and currencies in Javascript. And, through the magic of collator objects, the Internationalization API can also help sort a list of strings (e.g., names in an address book) in locale-specific order, as well as search for strings in a given list in a way that matches a user's locale preferences.
What's this, then?
This short introduction gives you a quick overview of some of the awesome things you will be able to do once the Internationalization API becomes more widely supported in browsers and in other ECMAScript-based programming environments - like Node.js.
Let's be clear before we start: the Internationalization API is not a full internationalization framework, so it won't help you localize your whole application to another language. What it can do is be really useful for common localization tasks involving dates, numbers, currencies, and sorting.
Following along
Quick browser check: Sadly, your browser doesn't (yet!) support the API. But don't despair! keep reading.
You can follow along with Chrome. Currently, to get the most functionality out of the Internationalization API we use a custom monkey-patch in Chrome - that is, we use a simple script that overrides some standard JavaScript behavior without changing any existing browser functionality. The mokey-patch is already loaded as part of this document, so if you do "inspect element" and bring up Chrome's JavaScript debug console, you are ready to follow along!
Can I use?
Before we can actually use the Internationalization API, we need to check if the browser supports it (i.e., we need to do some feature detection).
The official Internationalization API specification
defines the Intl
object that should be a property of
the JavaScript global object - in a browser, the global object is
the window
object.
To check if we can use the Internationalization API,
we are just going to check if Intl
is available on
the window
object:
if (window.Intl && typeof window.Intl === "object"){
//Assume it's supported, lets localize!
console.log("We are all good to go!");
}
And in Node.js, you would do something like the following:
(function(exports){
if("Intl" in exports){
console.log("Sweet! Lets localize!")
}
}(this)); //assuming this is the global scope
//PS: the above will also work fine in a browser!
Once we know that the Internationalization API is
supported, we are good to start localizing stuff! In case you are
wondering, the Intl
object contains a bunch of
useful other objects that we will make use of later.
Note: Because this API is still not final,
actually finding out if the browser supports the full API can be
tricky. For example, to date, there is only one implementation of
the API (Chrome)... and the Chrome team have put the
Internationalization API behind a unique vendor
prefix (v8
). Not only that, only parts of the API
are exposed to developers, which is why we use a custom
monkey-patch to make the whole thing work as if it was
actually natively implemented in the browser.
Localizing dates and times
You've probably done this hundreds of times:
var date = new Date();
And possibly displayed it in the browser's default locale by calling:
date.toLocaleDateString();
//returns e.g. "Friday, August 24, 2012"
The problem
The above might be all well and good if the user is sitting at home where her computer's locale settings (i.e., language and geographical region) are likely set correctly. But what happens if the user is on a PC whose locale settings are set to Japanese, but her preferred locale settings are Portuguese as used in Portugal?
Lets say our user is backpacking around the world and has logged into your Web application from an Internet cafe in Japan. How can you make sure she sees numbers, dates, and currencies formatted in a way she is most accustomed to?
The solution
Firstly, what we need is to convert what we know about the user's locale to a language tag. A language tag is a simple string that represents the user's preferred language and, optionally, where they are (or wish they were for the purposes of communication).
Language Tags and locales
You have undoubtedly encountered language tags before. For example "en-US", which roughly translates to "English as used in the United States"; as opposed to, say, "en-AU", which would be "English as used in Australia". And as you can see, most language tags simply identify some language as used in some country or region.
Language tags are used to help identify languages, whether spoken, written, signed, or otherwise signaled, for the purpose of communication.- BCP47.
But a language tag, or more generally, the concept of a locale, doesn't just relate to languages: it also indicates the conventions for how dates, times, and currencies are formatted and how lists are sorted - and can even have an impact on how things are spoken by text-to-speech software. For example, here is how "Alex" on MacOS X reads dates when the system settings are set to English and region is United States:
Niel Armstrong landed on the moon on 07/20/1969. But not on the 20/07/1969?
As the audio example shows, strings representing dates are treated differently based on the user's locale settings. In the United States the date is written in short form as: 07/20/2012; while in other English speaking countries, such as Australia and the UK, written as 20/07/2012 (i.e., days and months are swapped).
Many users have been confused, and many databases have broken, thanks to naive handling of dates. The Internationalization API should help make that a thing of the past.
Getting the user's locale
Unfortunately, there is no 100% reliable way in the browser of getting the user's locale information - unless you ask the user explicitly.
There are some unreliable hacks, like querying
navigator.language
in Chrome and Firefox, or
navigator.browserLanguage
in IE, or looking at
the HTTP Accept-Language
header using XHR. But these
techniques would not help our previously described
backpacking-user: because she is using a computer at an internet
cafe, she has no control over the language preferences of the
machine.
So, just ask if you need to. Or provide a way for the user to select their locale preferences.
Starting localization process
If we know the user's preferred language (in this case "pt-PT"),
the Internationalization API allows us to provide
that information as an argument to the
.toLocaleDateString()
function of a date:
var date = new Date();
date.toLocaleDateString("pt-PT");
//returns something like "24/8/2012"
OK, so the above output is not super useful - it's actually less pretty than what we originally had! But it's a start on our way to localizing the date more fully!
Note that what the above does give us is a date formatted in a way that is more or less guaranteed to be understood in the given locale, even thought it is not as pretty as having the date in "long" form (i.e., as in "Friday, 24 of August" ).
Just so we can see the API do something more impressive, lets say our user from China (using language tag is "zh-Hans", which is simplified Chinese):
date.toLocaleDateString("zh-Hans");
//returns something like "2012年8月28日"
So now we are getting somewhere! We've managed to convert our date object to simplified Chinese without hardly doing any work (which we like!).
Controlling the formatting
As we pointed out above, the date format "24/8/2012" will likely be comprehended, but it's not the most user-friendly way to display dates. In particular, it's nice to see what day of the week it is (Monday, Tuesday, etc.) and what month we are talking about.
To do this the Internationalization API allows us to
pass a set of options as an argument to .toLocaleDateString()
. For example:
var options = {weekday: "long"};
date.toLocaleDateString("pt", options);
//returns "sexta-feira" (which is Friday in Portuguese)
date.toLocaleDateString("ja", options);
//"金曜日" which is Friday in Japanese
Where it gets tricky is knowing how other cultures like to have
their "pretty dates" formatted. This is where the
Internationalization API really starts to show its usefulness -
it has this knowledge built in, and gives programmers control of
how to format a date through a set of options. These options are
then passed to the .toLocaleDateString()
or to the
.toLocaleString()
as an argument.
There is something important to note from the above. In Portugal, for example, people only use "sexta-feira" for "Friday" in formal contexts; in informal contexts, they just say "sexta" and drop the "-feira".
The take away here is that you should not just assume that what the API returns is always correct for the locale or audience you are targeting. Be sure to always consult someone who lives in the locale you are targeting about date, time, currency, and collation conventions. Don't just blindly rely on what the browser returns.
Formatting example
For example, lets say we want to output "day of week, day Month, Year". We would set the following options:
var options = {
weekday: "long",
year: "numeric",
month: "long",
day: "numeric"
};
date.toLocaleDateString("pt", options);
//returns "sexta-feira, 24 de agosto de 2012"
date.toLocaleDateString("jp", options);
//"2012年8月24日(金曜日)"
//And even arabic
date.toLocaleDateString("ar", options);
//"الجمعة، ٢٤ أغسطس، ٢٠١٢"
//And for our final trick: Thai Buddhist calendar and Thai digits
date.toLocaleDateString("th-u-ca-buddhist-nu-thai", options);
//returns "วันศุกร์ ๒๔ สิงหาคม ๒๕๕๕"
Pretty funky! The last example uses "Unicode extensions" to language tags, which we cover in the advanced language-tag construction section of this guide. But to give you a quick sense of how it works, this is how the "th-u-ca-buddhist-nu-thai" language tag breaks down:
-
th
- Thai
-
u
- Enable Unicode extensions
-
ca
- Calendar
-
buddhist
- Buddhist calendar
-
nu
- Numeric format
-
thai
- Thai
The above code examples are starting to show the usefulness of the Internationalization API, but there is a problem: how do we know what locales the Browser understands? That is the topic we will come back to later in this article.
Formatting options
The options that can be passed to
date.toLocaleString()
,
date.toLocaleDateString()
, and
date.toLocaleTimeString()
as object literal are given
in the table below.
If you are using Chrome, you can use the check boxes and radio buttons below to mix and match options and see what is outputted at the bottom of the table.
* Output may vary from one
locale to another and from one browser to another! The output
shown here is from Chrome 21. Output was constructed by calling
(new Date("3/9/2015")).toLocaleString("en",
options);
Custom date-time formatters
Consider a scenario where you have a potentially large and dynamic set of date objects that you need to localize to the same format over and over again (e.g., a list of birth dates). Up to this point in the article, we've been doing date conversion in the following way:
var date = new Date(),
options = {
weekday: "long",
year: "numeric",
month: "long",
day: "numeric"
};
date.toLocaleDateString("en", options);
If we know that we want to format all days the same way for a
particular purpose, then the Internationalization API provides a
special set of objects called Intl.DateTimeFormat
that you can create to help you out.
var formatter = new Intl.DateTimeFormat(lang, options),
lang = ["en"], //using an array because of quirk in Chrome
dates = [new Date("1/1/1"),
new Date("2/2/2"),
new Date("3/3/3")],
options = {
weekday: "long",
year: "numeric",
month: "long",
day: "numeric"
},
date, result;
//loop through dates formatting each one
for (var i = 0; i < dates.length; i++) {
date = dates[i];
result = formatter.format(date);
console.log(result);
}
There is no huge advantage to using a custom formatter over just
calling date.toLocaleString()
, though the
Internationalization spec does claim potential performance
benefits (you would probably need quite a large list of dates to
see those performance gains).
Currencies
Formatting currencies is super easy with the Internationalization
API. All you need to do is set two options - the
style
and the currency
you want. For
example:
{style: "currency", currency: "USD"}
The value of the currency
option is a currency code
from a spec called ISO4217, which is published by
the International Standards Organization (ISO).
Thankfully, the full list of currency
codes is freely available on Wikipedia. ISO 4217
is the authoritative set of currency codes used around the world
to distinguish between currencies. Each currency identified by
ISO4217 is three characters long (e.g., USD
for
United States Dollar, and GPB
for Great British
Pounds).
var bucks = 12,
props = {
style: "currency",
currency: "USD"
};
bucks.toLocaleString("en", props);
//returns "$12.00"
//Represented as Australian dollars
props.currency = "AUD";
bucks.toLocaleString("en", props);
//"AU$12.00"
//Represented as Great British Pounds
props.currency = "GBP";
bucks.toLocaleString("en", props);
//"£12.00"
Note that we are not doing currency conversion here, just using a standardised currency code to represent how many "bucks" we have (as represented by the currency sign). A real example of usage commonly seen in newspaper articles would require us to do some currency conversion, but imagine:
... Dr. Evil initially requested $1,000,000 (€794,122.00)...
More powerfully, you could adapt the above to a new language
altogether based on your user's preferences. Here is the same,
but adapted to Arabic (language tag "ar"
) and by
changing the first argument (794122).toLocaleString("ar",
props)
:
... Dr. Evil initially requested $1,000,000 (€ ٧٩٤٬١٢٢٫٠٠)...
Note the Euro currency sign remains, but the number formatting is localized.
More advanced options
Coming soon... need to cover: minimumIntegerDigits, minimumFractionDigits, maximumFractionDigits, minimumSignificantDigits, and maximumSignificantDigits.
Try it!
The code we use for the above currency formatter is super simple:
var form = document.querySelector("#currencyform");
//set up change listener
form.onchange = function () {
//extract values from the form
var value = Number(this.amount.value),
currencyCode = this.currency.value,
props = {
style: "currency",
currency: currencyCode,
currencyDisplay: display
};
//display output
this.out.value = value.toLocaleString("en", props);
};
Localizing numbers
Coming soon... covers percentages, etc. Also covers Intl.NumberFormat().
Sorting (collation)
This section is under construction...
Imagine you are building a contacts manager application and we have the following list of names:
var friends = ["Mary", "Bob", "Tim" , "Adam", "Steve"];
Naturally, when you display the contacts to the user you want to make sure those names are shown in alphabetical order.
var names = names.sort();
//gives us ["Adam", "Bob", "Mary", "Steve", "Tim"];
That's pretty straight forward right? Well, now imagine your user has a bunch of Nordic friends.
names = ["Ølgård", "Åbjørn", "Oddbjørg", "Hellbjørg", "Aino"];
In order to sort them, you (or the computer) would need to know which letter comes before which letter in the given alphabet. The problem is, which alphabet? Are those names really Norwegian? or are they Danish? What happens if we have Chinese friend (e.g., 强国) and they were also in the list?
This gets even more crazy. In German, for instance, there is a difference between "phone book" ordering and "dictionary ordering"... Yeah. So you can imagine that this is quite a challenging problem.
Its in situations like those where the Internationalization API can come to the rescue.
Hey, Browser! You speak my language?
There are literally thousands of languages spoken throughout the world and those languages (and associated conventions) are spoken differently depending where one is (e.g., US English conventions are different to, say, Australian English… same with Spanish in Spain when compared to Spanish in Argentina). Some countries even have multiple official languages (e.g., French and English in Canada). And even though browsers know many of these difference language/local combinations (i.e., through interpreting language tags), it can't possibly know every combination.
Thankfully, the Internationalization API provides a way
to check if the browser knows how to deal with given locale(s).
This check is done by using the supportedLocalesOf
methods:
//Given German ("de"), and two languages that don't exist:
Intl.DateTimeFormat.supportedLocalesOf(["de", "oo", "xx"]);
//the browser returns ["de"]
Another cool feature of the API is that it will do its best to support a locale as closely as it can. So, given, "En-GB-fff" (where fff is just something I made up), the browser will just strip away the fff and return "en-GB".
//fff is garbage below, but could be an actual language sub-tag
v8Intl.DateTimeFormat.supportedLocalesOf(["En-gb-fff"]);
//returns ["en-GB"]
Note that another helpful thing that the browser does when we
call supportedLocalesOf
is that it normalizes our
languages tags into "canonical form" (which includes making the
GB capitalised). Calling this method will also fix errors in
language tags and remove redundant tags too:
var tags = ["pt-*", "En-", "en-*-us", "x-foo" ];
Intl.DateTimeFormat.supportedLocalesOf(tags);
//returns ["pt", "en", "und-x-foo"]
Note above the "En-" and "en-*" become "en"… and note "und-x-foo" for private use language (language tags that a start with an "x-").
Fancy language-tags
To be written... section will cover Unicode extensions and other interesting features.
Acknowledgements
A huge thanks to Norbert Lindenberg for his guidance, patience with my dumb questions, and for being awesome. This article was inspired by his original article The ECMAScript Internationalization API. If you want to take your knowledge of the Internationalization API to the next level, check out his article. It goes into much more depth than this one.
Also a huge thanks to Jordan Gray for proposing that this guide be written up.
Using the monkey patch
If you want to play with the monkey-patch in your own code, just put the following script into your HTML. Remember that the code is experimental:
<script src="https://raw.github.com/marcoscaceres/jsi18n/master/jsi18n_patch.js">
</script>
You can then confirm if it worked by doing "inspect element" and typing into your JavaScript console:
Intl
//returns an Object
You can also get the code from github, fork it, and use in your own projects however you wish! If you find a bug, let us know!