JavaScript Internationalization API

A simple introduction.

View project on GitHub

Internationalization and localization

The ECMAScript Internationalization API is a forthcoming standard that helps automagically localise the output of dates, numbers, and currencies in Javascript. And, through the magic of collator objects, the Internationalization API can also help sort a list of strings (e.g., names in an address book) in locale-specific order, as well as search for strings in a given list in a way that matches a user's locale preferences.

What's this, then?

This short introduction gives you a quick overview of some of the awesome things you will be able to do once the Internationalization API becomes more widely supported in browsers and in other ECMAScript-based programming environments - like Node.js.

Let's be clear before we start: the Internationalization API is not a full internationalization framework, so it won't help you localize your whole application to another language. What it can do is be really useful for common localization tasks involving dates, numbers, currencies, and sorting.

Following along

Quick browser check: Sadly, your browser doesn't (yet!) support the API. But don't despair! keep reading.

You can follow along with Chrome. Currently, to get the most functionality out of the Internationalization API we use a custom monkey-patch in Chrome - that is, we use a simple script that overrides some standard JavaScript behavior without changing any existing browser functionality. The mokey-patch is already loaded as part of this document, so if you do "inspect element" and bring up Chrome's JavaScript debug console, you are ready to follow along!

Can I use?

Before we can actually use the Internationalization API, we need to check if the browser supports it (i.e., we need to do some feature detection).

The official Internationalization API specification defines the Intl object that should be a property of the JavaScript global object - in a browser, the global object is the window object.

To check if we can use the Internationalization API, we are just going to check if Intl is available on the window object:

if (window.Intl && typeof window.Intl === "object"){
   //Assume it's supported, lets localize!
   console.log("We are all good to go!");
}

And in Node.js, you would do something like the following:

(function(exports){
  if("Intl" in exports){
    console.log("Sweet! Lets localize!")
  }
}(this)); //assuming this is the global scope
//PS: the above will also work fine in a browser!

Once we know that the Internationalization API is supported, we are good to start localizing stuff! In case you are wondering, the Intl object contains a bunch of useful other objects that we will make use of later.

Note: Because this API is still not final, actually finding out if the browser supports the full API can be tricky. For example, to date, there is only one implementation of the API (Chrome)... and the Chrome team have put the Internationalization API behind a unique vendor prefix (v8). Not only that, only parts of the API are exposed to developers, which is why we use a custom monkey-patch to make the whole thing work as if it was actually natively implemented in the browser.

Localizing dates and times

You've probably done this hundreds of times:

var date = new Date();

And possibly displayed it in the browser's default locale by calling:

date.toLocaleDateString();
//returns e.g. "Friday, August 24, 2012" 

The problem

The above might be all well and good if the user is sitting at home where her computer's locale settings (i.e., language and geographical region) are likely set correctly. But what happens if the user is on a PC whose locale settings are set to Japanese, but her preferred locale settings are Portuguese as used in Portugal?

Lets say our user is backpacking around the world and has logged into your Web application from an Internet cafe in Japan. How can you make sure she sees numbers, dates, and currencies formatted in a way she is most accustomed to?

The solution

Firstly, what we need is to convert what we know about the user's locale to a language tag. A language tag is a simple string that represents the user's preferred language and, optionally, where they are (or wish they were for the purposes of communication).

Language Tags and locales

You have undoubtedly encountered language tags before. For example "en-US", which roughly translates to "English as used in the United States"; as opposed to, say, "en-AU", which would be "English as used in Australia". And as you can see, most language tags simply identify some language as used in some country or region.

Language tags are used to help identify languages, whether spoken, written, signed, or otherwise signaled, for the purpose of communication. - BCP47.

But a language tag, or more generally, the concept of a locale, doesn't just relate to languages: it also indicates the conventions for how dates, times, and currencies are formatted and how lists are sorted - and can even have an impact on how things are spoken by text-to-speech software. For example, here is how "Alex" on MacOS X reads dates when the system settings are set to English and region is United States:

Niel Armstrong landed on the moon on 07/20/1969. But not on the 20/07/1969?

As the audio example shows, strings representing dates are treated differently based on the user's locale settings. In the United States the date is written in short form as: 07/20/2012; while in other English speaking countries, such as Australia and the UK, written as 20/07/2012 (i.e., days and months are swapped).

Many users have been confused, and many databases have broken, thanks to naive handling of dates. The Internationalization API should help make that a thing of the past.

Getting the user's locale

Unfortunately, there is no 100% reliable way in the browser of getting the user's locale information - unless you ask the user explicitly.

There are some unreliable hacks, like querying navigator.language in Chrome and Firefox, or navigator.browserLanguage in IE, or looking at the HTTP Accept-Language header using XHR. But these techniques would not help our previously described backpacking-user: because she is using a computer at an internet cafe, she has no control over the language preferences of the machine.

So, just ask if you need to. Or provide a way for the user to select their locale preferences.

Starting localization process

If we know the user's preferred language (in this case "pt-PT"), the Internationalization API allows us to provide that information as an argument to the .toLocaleDateString() function of a date:

var date = new Date();
date.toLocaleDateString("pt-PT");
//returns something like "24/8/2012"

OK, so the above output is not super useful - it's actually less pretty than what we originally had! But it's a start on our way to localizing the date more fully!

Note that what the above does give us is a date formatted in a way that is more or less guaranteed to be understood in the given locale, even thought it is not as pretty as having the date in "long" form (i.e., as in "Friday, 24 of August" ).

Just so we can see the API do something more impressive, lets say our user from China (using language tag is "zh-Hans", which is simplified Chinese):

date.toLocaleDateString("zh-Hans");
//returns something like "2012年8月28日"

So now we are getting somewhere! We've managed to convert our date object to simplified Chinese without hardly doing any work (which we like!).

Controlling the formatting

As we pointed out above, the date format "24/8/2012" will likely be comprehended, but it's not the most user-friendly way to display dates. In particular, it's nice to see what day of the week it is (Monday, Tuesday, etc.) and what month we are talking about.

To do this the Internationalization API allows us to pass a set of options as an argument to .toLocaleDateString(). For example:

var options = {weekday: "long"};

date.toLocaleDateString("pt", options);
//returns "sexta-feira" (which is Friday in Portuguese)

date.toLocaleDateString("ja", options);
//"金曜日" which is Friday in Japanese

Where it gets tricky is knowing how other cultures like to have their "pretty dates" formatted. This is where the Internationalization API really starts to show its usefulness - it has this knowledge built in, and gives programmers control of how to format a date through a set of options. These options are then passed to the .toLocaleDateString() or to the .toLocaleString() as an argument.

There is something important to note from the above. In Portugal, for example, people only use "sexta-feira" for "Friday" in formal contexts; in informal contexts, they just say "sexta" and drop the "-feira".

The take away here is that you should not just assume that what the API returns is always correct for the locale or audience you are targeting. Be sure to always consult someone who lives in the locale you are targeting about date, time, currency, and collation conventions. Don't just blindly rely on what the browser returns.

Formatting example

For example, lets say we want to output "day of week, day Month, Year". We would set the following options:

var options = {
    weekday: "long",
    year: "numeric",
    month: "long",
    day: "numeric"
};
date.toLocaleDateString("pt", options);
//returns "sexta-feira, 24 de agosto de 2012"

date.toLocaleDateString("jp", options);
//"2012年8月24日(金曜日)"

//And even arabic
date.toLocaleDateString("ar", options);
//"الجمعة، ٢٤ أغسطس، ٢٠١٢"

//And for our final trick: Thai Buddhist calendar and Thai digits
date.toLocaleDateString("th-u-ca-buddhist-nu-thai", options);
//returns "วันศุกร์ ๒๔ สิงหาคม ๒๕๕๕"

Pretty funky! The last example uses "Unicode extensions" to language tags, which we cover in the advanced language-tag construction section of this guide. But to give you a quick sense of how it works, this is how the "th-u-ca-buddhist-nu-thai" language tag breaks down:

th
Thai
u
Enable Unicode extensions
ca
Calendar
buddhist
Buddhist calendar
nu
Numeric format
thai
Thai

The above code examples are starting to show the usefulness of the Internationalization API, but there is a problem: how do we know what locales the Browser understands? That is the topic we will come back to later in this article.

Formatting options

The options that can be passed to date.toLocaleString(), date.toLocaleDateString(), and date.toLocaleTimeString()as object literal are given in the table below.

If you are using Chrome, you can use the check boxes and radio buttons below to mix and match options and see what is outputted at the bottom of the table.

Date formatter
Option Values and sample output*
  • M
  • Mon
  • Monday
  • A
  • AD
  • Anno Domini
  • 15
  • 2015
  • 03
  • 3
  • M
  • Mar
  • March
  • 09
  • 9
  • 12 AM
  • 12 AM
  • 0
  • 0
  • 0
  • 0
  • 3/9/2015 GMT+00:00
  • 3/9/2015 GMT+00:00

* Output may vary from one locale to another and from one browser to another! The output shown here is from Chrome 21. Output was constructed by calling (new Date("3/9/2015")).toLocaleString("en", options);

Custom date-time formatters

Consider a scenario where you have a potentially large and dynamic set of date objects that you need to localize to the same format over and over again (e.g., a list of birth dates). Up to this point in the article, we've been doing date conversion in the following way:

var date = new Date(),
    options = {
        weekday: "long",
        year: "numeric",
        month: "long",
        day: "numeric"
    };
date.toLocaleDateString("en", options);

If we know that we want to format all days the same way for a particular purpose, then the Internationalization API provides a special set of objects called Intl.DateTimeFormat that you can create to help you out.

var formatter = new Intl.DateTimeFormat(lang, options),
    lang = ["en"], //using an array because of quirk in Chrome
    dates = [new Date("1/1/1"),
             new Date("2/2/2"),
             new Date("3/3/3")],
    options = {
        weekday: "long",
        year: "numeric",
        month: "long",
        day: "numeric"
    },
    date, result;

//loop through dates formatting each one
for (var i = 0; i < dates.length; i++) {
    date = dates[i];
    result = formatter.format(date);
    console.log(result);
}

There is no huge advantage to using a custom formatter over just calling date.toLocaleString(), though the Internationalization spec does claim potential performance benefits (you would probably need quite a large list of dates to see those performance gains).

Currencies

Formatting currencies is super easy with the Internationalization API. All you need to do is set two options - the style and the currency you want. For example:

{style: "currency", currency: "USD"}

The value of the currency option is a currency code from a spec called ISO4217, which is published by the International Standards Organization (ISO). Thankfully, the full list of currency codes is freely available on Wikipedia. ISO 4217 is the authoritative set of currency codes used around the world to distinguish between currencies. Each currency identified by ISO4217 is three characters long (e.g., USD for United States Dollar, and GPB for Great British Pounds).

var bucks = 12,
    props = {
        style: "currency",
        currency: "USD"
    };

bucks.toLocaleString("en", props);
//returns "$12.00"

//Represented as Australian dollars
props.currency = "AUD";
bucks.toLocaleString("en", props);
//"AU$12.00"

//Represented as Great British Pounds
props.currency = "GBP";
bucks.toLocaleString("en", props);
//"£12.00"

Note that we are not doing currency conversion here, just using a standardised currency code to represent how many "bucks" we have (as represented by the currency sign). A real example of usage commonly seen in newspaper articles would require us to do some currency conversion, but imagine:

... Dr. Evil initially requested $1,000,000 (€794,122.00)...

More powerfully, you could adapt the above to a new language altogether based on your user's preferences. Here is the same, but adapted to Arabic (language tag "ar") and by changing the first argument (794122).toLocaleString("ar", props):

... Dr. Evil initially requested $1,000,000 (€ ٧٩٤٬١٢٢٫٠٠)...

Note the Euro currency sign remains, but the number formatting is localized.

More advanced options

Coming soon... need to cover: minimumIntegerDigits, minimumFractionDigits, maximumFractionDigits, minimumSignificantDigits, and maximumSignificantDigits.

Try it!

Currency formatter
...

The code we use for the above currency formatter is super simple:

var form = document.querySelector("#currencyform");

//set up change listener
form.onchange = function () {
    //extract values from the form
    var value = Number(this.amount.value),
        currencyCode = this.currency.value,
        props = {
            style: "currency",
            currency: currencyCode,
            currencyDisplay: display
        };
    //display output
    this.out.value = value.toLocaleString("en", props);
};

Localizing numbers

Coming soon... covers percentages, etc. Also covers Intl.NumberFormat().

Sorting (collation)

This section is under construction...

Imagine you are building a contacts manager application and we have the following list of names:

var friends = ["Mary", "Bob", "Tim" , "Adam", "Steve"];

Naturally, when you display the contacts to the user you want to make sure those names are shown in alphabetical order.

var names = names.sort();
//gives us ["Adam", "Bob", "Mary", "Steve", "Tim"];

That's pretty straight forward right? Well, now imagine your user has a bunch of Nordic friends.

names = ["Ølgård", "Åbjørn", "Oddbjørg", "Hellbjørg", "Aino"];

In order to sort them, you (or the computer) would need to know which letter comes before which letter in the given alphabet. The problem is, which alphabet? Are those names really Norwegian? or are they Danish? What happens if we have Chinese friend (e.g., 强国) and they were also in the list?

This gets even more crazy. In German, for instance, there is a difference between "phone book" ordering and "dictionary ordering"... Yeah. So you can imagine that this is quite a challenging problem.

Its in situations like those where the Internationalization API can come to the rescue.

Hey, Browser! You speak my language?

There are literally thousands of languages spoken throughout the world and those languages (and associated conventions) are spoken differently depending where one is (e.g., US English conventions are different to, say, Australian English… same with Spanish in Spain when compared to Spanish in Argentina). Some countries even have multiple official languages (e.g., French and English in Canada). And even though browsers know many of these difference language/local combinations (i.e., through interpreting language tags), it can't possibly know every combination.

Thankfully, the Internationalization API provides a way to check if the browser knows how to deal with given locale(s). This check is done by using the supportedLocalesOf methods:

//Given German ("de"), and two languages that don't exist:
Intl.DateTimeFormat.supportedLocalesOf(["de", "oo", "xx"]);
//the browser returns ["de"]

Another cool feature of the API is that it will do its best to support a locale as closely as it can. So, given, "En-GB-fff" (where fff is just something I made up), the browser will just strip away the fff and return "en-GB".

//fff is garbage below, but could be an actual language sub-tag
v8Intl.DateTimeFormat.supportedLocalesOf(["En-gb-fff"]);
//returns ["en-GB"]

Note that another helpful thing that the browser does when we call supportedLocalesOf is that it normalizes our languages tags into "canonical form" (which includes making the GB capitalised). Calling this method will also fix errors in language tags and remove redundant tags too:

var tags = ["pt-*", "En-", "en-*-us", "x-foo" ];
Intl.DateTimeFormat.supportedLocalesOf(tags);
//returns ["pt", "en", "und-x-foo"]

Note above the "En-" and "en-*" become "en"… and note "und-x-foo" for private use language (language tags that a start with an "x-").

Fancy language-tags

To be written... section will cover Unicode extensions and other interesting features.

Acknowledgements

A huge thanks to Norbert Lindenberg for his guidance, patience with my dumb questions, and for being awesome. This article was inspired by his original article The ECMAScript Internationalization API. If you want to take your knowledge of the Internationalization API to the next level, check out his article. It goes into much more depth than this one.

Also a huge thanks to Jordan Gray for proposing that this guide be written up.

Using the monkey patch

If you want to play with the monkey-patch in your own code, just put the following script into your HTML. Remember that the code is experimental:

<script src="https://raw.github.com/marcoscaceres/jsi18n/master/jsi18n_patch.js">
</script>

You can then confirm if it worked by doing "inspect element" and typing into your JavaScript console:

Intl
//returns an Object

You can also get the code from github, fork it, and use in your own projects however you wish! If you find a bug, let us know!