Setting the Locale
An internationalized program can display information differently throughout the world. For example, the program will display different messages in Paris, Tokyo, and New York. If the localization process has been fine-tuned, the program will display different messages in New York and London to account for the differences between American and British English. How does an internationalized program identify the appropriate language and region of its end users? Easy. It references a Locale
object.
A Locale
object is an identifier for a particular combination of language and region. If a class varies its behavior according to Locale
, it is said to be locale-sensitive. For example, the NumberFormat
class is locale-sensitive; the format of the number it returns depends on the Locale
. Thus NumberFormat
may return a number as 902 300 (France), or 902.300 (Germany), or 902,300 (United States). Locale
objects are only identifiers. The real work, such as formatting and detecting word boundaries, is performed by the methods of the locale-sensitive classes.
Creating a Locale
There are several ways to create a Locale
object. Regardless of the technique used, creation can be as simple as specifying the language code. However, you can further distinguish the locale by setting the region (also referred to as "country") and variant codes. If you are using the JDK 7 release or later, you can also specify the script code and Unicode locale extensions.
Version Note: The
Locale.Builder
class and theforLanguageTag
method were added in the Java SE 7 release.
LocaleBuilder class
The Locale.Builder
utility class can be used to construct a Locale
object that conforms to the IETF BCP 47 syntax. For example, to specify the French language and the country of Canada, you could invoke the Locale.Builder
constructor and then chain the setter methods as follows:
Locale aLocale = new Locale.Builder().setLanguage("fr").setRegion("CA").build();
The next example creates Locale
objects for the English language in the United States and Great Britain:
Locale bLocale = new Locale.Builder().setLanguage("en").setRegion("US").build();
Locale cLocale = new Locale.Builder().setLanguage("en").setRegion("GB").build();
The final example creates a Locale
object for the Russian language:
Locale dLocale = new Locale.Builder().setLanguage("ru").setScript("Cyrl").build();
Locale Constructors
There are three constructors available in the Locale
class for creating a Locale
object:
-
Locale(String language)
-
Locale(String language, String country)
-
Locale(String language, String country, String variant)
The following examples create Locale
objects for the French language in Canada, the English language in the U.S. and Great Britain, and the Russian language.
aLocale = new Locale("fr", "CA");
bLocale = new Locale("en", "US");
cLocale = new Locale("en", "GB");
dLocale = new Locale("ru");
It is not possible to set a script code on a Locale
object in a release earlier than JDK 7.
forLanguageTag Factory Method
If you have a language tag string that conforms to the IETF BCP 47 standard, you can use the forLanguageTag(String)
factory method, which was introduced in the Java SE 7 release. For example:
Locale aLocale = Locale.forLanguageTag("en-US");
Locale bLocale = Locale.forLanguageTag("ja-JP-u-ca-japanese");
Locale Constants
For your convenience the Locale
class provides constants for some languages and countries. For example:
cLocale = Locale.JAPAN;
dLocale = Locale.CANADA_FRENCH;
When you specify a language constant, the region portion of the Locale
is undefined. The next three statements create equivalent Locale
objects:
j1Locale = Locale.JAPANESE;
j2Locale = new Locale.Builder().setLanguage("ja").build();
j3Locale = new Locale("ja");
The Locale
objects created by the following three statements are also equivalent:
j4Locale = Locale.JAPAN;
j5Locale = new Locale.Builder().setLanguage("ja").setRegion("JP").build();
j6Locale = new Locale("ja", "JP");
Codes
The following sections discuss the language code and the optional script, region, and variant codes.
Language Code
The language code is either two or three lowercase letters that conform to the ISO 639 standard. You can find a full list of the ISO 639 codes at http://www.loc.gov/standards/iso639-2/php/code_list.php.
The following table lists a few of the language codes.
Language Code | Description |
---|---|
de |
German |
en |
English |
fr |
French |
ru |
Russian |
ja |
Japanese |
jv |
Javanese |
ko |
Korean |
zh |
Chinese |
Script Code
The script code begins with an uppercase letter followed by three lowercase letters and conforms to the ISO 15924 standard. You can find a full list of the ISO 15924 codes athttp://unicode.org/iso15924/iso15924-codes.html.
The following table lists a few of the script codes.
Script Code | Description |
---|---|
Arab |
Arabic |
Cyrl |
Cyrillic |
Kana |
Katakana |
Latn |
Latin |
There are three methods for retrieving the script information for a Locale
:
-
getScript()
– returns the 4-letter script code for aLocale
object. If no script is defined for the locale, an empty string is returned. -
getDisplayScript()
– returns a name for the locale's script that is appropriate for display to the user. If possible, the name will be localized for the default locale. So, for example, if the script code is "Latn," the diplay script name returned would be the string "Latin" for an English language locale. -
getDisplayScript(Locale)
– returns the display name of the specifiedLocale
localized, if possible, for the locale.
Region Code
The region (country) code consists of either two or three uppercase letters that conform to the ISO 3166 standard, or three numbers that conform to the UN M.49 standard. A copy of the codes can be found athttp://www.chemie.fu-berlin.de/diverse/doc/ISO_3166.html.
The following table contains several sample country and region codes.
A-2 Code | A-3 Code | Numeric Code | Description |
---|---|---|---|
AU |
AUS |
036 |
Australia |
BR |
BRA |
076 |
Brazil |
CA |
CAN |
124 |
Canada |
CN |
CHN |
156 |
China |
DE |
DEU |
276 |
Germany |
FR |
FRA |
250 |
France |
IN |
IND |
356 |
India |
RU |
RUS |
643 |
Russian Federation |
US |
USA |
840 |
United States |
Variant Code
The optional variant
code can be used to further distinguish your Locale
. For example, the variant code can be used to indicate dialectical differences that are not covered by the region code.
Version Note: Prior to the Java SE 7 release, the variant code was sometimes used to identify differences that were not specific to the language or region. For example, it might have been used to identify differences between computing platforms, such as Windows or UNIX. Under the IETF BCP 47 standard, this use is discouraged.
To define non-language-specific variations relevant to your environment, use the extensions mechanism, as explained in BCP 47 Extensions.
As of the Java SE 7 release, which conforms to the IETF BCP 47 standard, the variant code is used specifically to indicate additional variations that define a language or its dialects. The IETF BCP 47 standard imposes syntactic restrictions on the variant subtag. You can see a list of variant codes (search for variant) at http://www.iana.org/assignments/language-subtag-registry.
For example, Java SE uses the variant code to support the Thai language. By convention, a NumberFormat
object for the th
and th_TH
locales will use common Arabic digit shapes, or Arabic numerals, to format Thai numbers. However, a NumberFormat
for the th_TH_TH
locale uses Thai digit shapes. The excerpt from ThaiDigits.java
demonstrates this:
String outputString = new String();
Locale[] thaiLocale = {
new Locale("th"),
new Locale("th", "TH"),
new Locale("th", "TH", "TH")
};
for (Locale locale : thaiLocale) {
NumberFormat nf = NumberFormat.getNumberInstance(locale);
outputString = outputString + locale.toString() + ": ";
outputString = outputString + nf.format(573.34) + "\n";
}
Identifying Available Locales
You can create a Locale
with any combination of valid language and country codes, but that doesn't mean that you can use it. Remember, a Locale
object is only an identifier. You pass the Locale
object to other objects, which then do the real work. These other objects, which we call locale-sensitive, do not know how to deal with all possible Locale
definitions.
To find out which types of Locale
definitions a locale-sensitive class recognizes, you invoke the getAvailableLocales
method. For example, to find out which Locale
definitions are supported by the DateFormat
class, you could write a routine such as the following:
import java.util.*;
import java.text.*;
public class Available {
static public void main(String[] args) {
Locale list[] = DateFormat.getAvailableLocales();
for (Locale aLocale : list) {
System.out.println(aLocale.toString());
}
}
}
Note that the String
returned by toString
contains the language and country codes, separated by an underscore:
ar_EG
be_BY
bg_BG
ca_ES
cs_CZ
da_DK
de_DE
...
If you want to display a list of Locale
names to end users, you should show them something easier to understand than the language and country codes returned by toString
. Instead you can invoke the Locale.getDisplayName
method, which retrieves a localized String
of a Locale
object. For example, when toString
is replaced by getDisplayName
in the preceding code, the program prints the following lines:
Arabic (Egypt)
Belarussian (Belarus)
Bulgarian (Bulgaria)
Catalan (Spain)
Czech (Czech Republic)
Danish (Denmark)
German (Germany)
...
You may see different locale lists depending on the Java Platform implementations.
Language Tag Filtering and Lookup
The Java Programming language contains internationalization support for language tags, language tag filtering, and language tag lookup. These features are specified by IETF BCP 47 , which contains RFC 5646 "Tags for Identifying Languages" and RFC 4647 "Matching of Language Tags." This lesson describes how this support is provided in the JDK.
What are language tags?
Language tags are specially formatted strings that provide information about a particular language. A language tag might be something simple (such as "en" for English), something complex (such as "zh-cmn-Hans-CN" for Chinese, Mandarin, Simplified script, as used in China), or something in between (such as "sr-Latn", for Serbian written using Latin script). Language tags consist of "subtags" separated by hyphens; this terminology is used throughout the API documentation.
The java.util.Locale
class provides support for language tags. A Locale
contains several different fields: language (such as "en" for English, or "ja" for Japanese), script (such as "Latn" for Latin or "Cyrl" for Cyrillic), country (such as "US" for United States or "FR" for France), variant (which indicates some variant of a locale), and extensions (which provides a map of single character keys to String
values, indicating extensions apart from language identification). To create a Locale
object from a language tag String
, invoke Locale.forLanguageTag(String)
, passing in the language tag as its only argument. Doing so creates and returns a new Locale
object for use in your application.
Example 1:
package languagetagdemo;
import java.util.Locale;
public class LanguageTagDemo {
public static void main(String[] args) {
Locale l = Locale.forLanguageTag("en-US");
}
}
Note that the Locale API only requires that your language tag be syntactically well-formed. It does not perform any extra validation (such as checking to see if the tag is registered in the IANA Language Subtag Registry).
What Are Language Ranges?
Language ranges (represented by class java.util.Locale.LanguageRange
) identify sets of language tags that share specific attributes. Language ranges are classified as either basic or extended, and are similar to language tags in that they consist of subtags separated by hyphens. Examples of basic language ranges include "en" (English), "ja-JP" (Japanese, Japan), and "*" (a special language range which matches any language tag). Examples of extended language ranges include "*-CH" (any language, Switzerland), "es-*" (Spanish, any regions), and "zh-Hant-*" (Traditional Chinese, any region).
Furthermore, language ranges may be stored in Language Priority Lists, which enable users to prioritize their language preferences in a weighted list. Language Priority Lists are expressed by placing LanguageRange
objects into a java.util.List
, which can then be passed to the Locale
methods that accept a List
of LanguageRange
objects.
Creating a Language Range
The Locale.LanguageRange
class provides two different constructors for creating language ranges:
-
public Locale.LanguageRange(String range)
-
public Locale.LanguageRange(String range, double weight)
The only difference between them is that the second version allows a weight to be specified; this weight will be considered if the range is placed into a Language Priority List.
Locale.LanguageRange
also specifies some constants to be used with these constructors:
-
public static final double MAX_WEIGHT
-
public static final double MIN_WEIGHT
The MAX_WEIGHT
constant holds a value of 1.0, which indicates that it is a good fit for the user. The MIN_WEIGHT
constant holds a value of 0.0, indicating that it is not.
Example 2:
package languagetagdemo;
import java.util.Locale;
public class LanguageTagDemo {
public static void main(String[] args) {
// Create Locale
Locale l = Locale.forLanguageTag("en-US");
// Define Some LanguageRange Objects
Locale.LanguageRange range1 = new Locale.LanguageRange("en-US",Locale.LanguageRange.MAX_WEIGHT);
Locale.LanguageRange range2 = new Locale.LanguageRange("en-GB*",0.5);
Locale.LanguageRange range3 = new Locale.LanguageRange("fr-FR",Locale.LanguageRange.MIN_WEIGHT);
}
}
Example 2 creates three language ranges: English (United States), English (Great Britain), and French (France). These ranges are weighted to express the user's preferences, in order from most preferred to least preferred.
Creating a Language Priority List
You can create a Language Priority List from a list of language ranges by using the LanguageRange.parse(String)
method. This method accepts a list of comma-separated language ranges, performs a syntactic check for each language range in the given ranges, and then returns the newly created Language Priority List.
For detailed information about the required format of the "ranges" parameter, see the API specification for this method.
Example 3:
package languagetagdemo;
import java.util.Locale;
import java.util.List;
public class LanguageTagDemo {
public static void main(String[] args) {
// Create Locale
Locale l = Locale.forLanguageTag("en-US");
// Create a Language Priority List
String ranges = "en-US;q=1.0,en-GB;q=0.5,fr-FR;q=0.0";
List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges)
}
}
Example 3 creates the same three language ranges as Example 2, but stores them in a String
object, which is passed to the parse(String)
method. The returned List
of LanguageRange
objects is the Language Priority List.
Filtering Language Tags
Language tag filtering is the process of matching a set of language tags against a user's Language Priority List. The result of filtering will be a complete list of all matching results. The Locale
class defines two filter methods that return a list of Locale
objects. Their signatures are as follows:
-
public static List<Locale> filter (List<Locale.LanguageRange> priorityList, Collection<Locale> locales)
-
public static List<Locale> filter (List<Locale.LanguageRange> priorityList, Collection<Locale> locales, Locale.FilteringMode mode)
In both methods, the first argument specifies the user's Language Priority List as described in the previous section.
The second argument specifies a Collection
of Locale
objects to match against. The match itself will take place according to the rules specified by RFC 4647.
The third argument (if provided) specifies the "filtering mode" to use. The Locale.FilteringMode
enum provides a number of different values to choose from, such as AUTOSELECT_FILTERING
(for basic language range filtering) or EXTENDED_FILTERING
(for extended language range filtering).
Example 4 provides a demonstration of language tag filtering.
Example 4:
package languagetagdemo;
import java.util.Locale;
import java.util.Collection;
import java.util.List;
import java.util.ArrayList;
public class LanguageTagDemo {
public static void main(String[] args) {
// Create a collection of Locale objects to filter
Collection<Locale> locales = new ArrayList<>();
locales.add(Locale.forLanguageTag("en-GB"));
locales.add(Locale.forLanguageTag("ja"));
locales.add(Locale.forLanguageTag("zh-cmn-Hans-CN"));
locales.add(Locale.forLanguageTag("en-US"));
// Express the user's preferences with a Language Priority List
String ranges = "en-US;q=1.0,en-GB;q=0.5,fr-FR;q=0.0";
List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges);
// Now filter the Locale objects, returning any matches
List<Locale> results = Locale.filter(languageRanges,locales);
// Print out the matches
for(Locale l : results){
System.out.println(l.toString());
}
}
}
The output of this program is:
en_US
en_GB
This returned list is ordered according to the weights specified in the user's Language Priority List.
The Locale
class also defines filterTags
methods for filtering language tags as String
objects.
The method signatures are as follows:
-
public static List<String> filterTags (List<Locale.LanguageRange> priorityList, Collection<String> tags)
-
public static List<String> filterTags (List<Locale.LanguageRange> priorityList, Collection<String> tags, Locale.FilteringMode mode)
Example 5 provides the same search as Example 4, but uses String
objects instead of Locale
objects.
Example 5:
package languagetagdemo;
import java.util.Locale;
import java.util.Collection;
import java.util.List;
import java.util.ArrayList;
public class LanguageTagDemo {
public static void main(String[] args) {
// Create a collection of String objects to match against
Collection<String> tags = new ArrayList<>();
tags.add("en-GB");
tags.add("ja");
tags.add("zh-cmn-Hans-CN");
tags.add("en-US");
// Express user's preferences with a Language Priority List
String ranges = "en-US;q=1.0,en-GB;q=0.5,fr-FR;q=0.0";
List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges);
// Now search the locales for the best match
List<String> results = Locale.filterTags(languageRanges,tags);
// Print out the matches
for(String s : results){
System.out.println(s);
}
}
}
As before, the search will match and return "en-US" and "en-GB" (in that order).
Performing Language Tag Lookup
In contrast to language tag filtering, language tag lookup is the process of matching language ranges to sets of language tags and returning the one language tag that best matches the range. RFC4647 states that: "Lookup produces the single result that best matches the user's preferences from the list of available tags, so it is useful in cases in which a single item is required (and for which only a single item can be returned). For example, if a process were to insert a human-readable error message into a protocol header, it might select the text based on the user's language priority list. Since the process can return only one item, it is forced to choose a single item and it has to return some item, even if none of the content's language tags match the language priority list supplied by the user."
Example 6:
package languagetagdemo;
import java.util.Locale;
import java.util.Collection;
import java.util.List;
import java.util.ArrayList;
public class LanguageTagDemo {
public static void main(String[] args) {
// Create a collection of Locale objects to search
Collection<Locale> locales = new ArrayList<>();
locales.add(Locale.forLanguageTag("en-GB"));
locales.add(Locale.forLanguageTag("ja"));
locales.add(Locale.forLanguageTag("zh-cmn-Hans-CN"));
locales.add(Locale.forLanguageTag("en-US"));
// Express the user's preferences with a Language Priority List
String ranges = "en-US;q=1.0,en-GB;q=0.5,fr-FR;q=0.0";
List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges);
// Find the BEST match, and return just one result
Locale result = Locale.lookup(languageRanges,locales);
System.out.println(result.toString());
}
}
In contrast to the filtering examples, the lookup demo in Example 6 returns the one object that is the best match (en-US
in this case). For completenes, Example 7 shows how to perform the same lookup using String
objects.
Example 7:
package languagetagdemo;
import java.util.Locale;
import java.util.Collection;
import java.util.List;
import java.util.ArrayList;
public class LanguageTagDemo {
public static void main(String[] args) {
// Create a collection of String objects to match against
Collection<String> tags = new ArrayList<>();
tags.add("en-GB");
tags.add("ja");
tags.add("zh-cmn-Hans-CN");
tags.add("en-US");
// Express user's preferences with a Language Priority List
String ranges = "en-US;q=1.0,en-GB;q=0.5,fr-FR;q=0.0";
List<Locale.LanguageRange> languageRanges = Locale.LanguageRange.parse(ranges);
// Find the BEST match, and return just one result
String result = Locale.lookupTag(languageRanges, tags);
System.out.println(result);
}
}
This example returns the single object that best matches the user's Language Priority List.
The Scope of a Locale
The Java platform does not require you to use the same Locale
throughout your program. If you wish, you can assign a different Locale
to every locale-sensitive object in your program. This flexibility allows you to develop multilingual applications, which can display information in multiple languages.
However, most applications are not multi-lingual and their locale-sensitive objects rely on the default Locale
. Set by the Java Virtual Machine when it starts up, the default Locale
corresponds to the locale of the host platform. To determine the default Locale
of your Java Virtual Machine, invoke the Locale.getDefault
method.
Note:
It is possible to independently set the default locale for two types of uses: the format setting is used for formatting resources, and the display setting is used in menus and dialogs. Introduced in the Java SE 7 release, the
Locale.getDefault(Locale.Category)
method takes aLocale.Category
parameter. Passing theFORMAT
enum to thegetDefault(Locale.Category)
method returns the default locale for formatting resources. Similarly, passing theDISPLAY
enum returns the default locale used by the UI. The correspondingsetDefault(Locale.Category, Locale)
method allows setting the locale for the desired category. The no-argumentgetDefault
method returns theDISPLAY
default value.On the Windows platform, these default values are initialized according to the "Standards and Formats" and "Display Language" settings in the Windows control panel.
You should not set the default Locale
programmatically because it is shared by all locale-sensitive classes.
Distributed computing raises some interesting issues. For example, suppose you are designing an application server that will receive requests from clients in various countries. If the Locale
for each client is different, what should be the Locale
of the server? Perhaps the server is multithreaded, with each thread set to the Locale
of the client it services. Or perhaps all data passed between the server and the clients should be locale-independent.
Which design approach should you take? If possible, the data passed between the server and the clients should be locale-independent. This simplifies the design of the server by making the clients responsible for displaying the data in a locale-sensitive manner. However, this approach won't work if the server must store the data in a locale-specific form. For example, the server might store Spanish, English, and French versions of the same data in different database columns. In this case, the server might want to query the client for its Locale
, since the Locale
may have changed since the last request.
Locale-Sensitive Services SPI
This feature enables the plug-in of locale-dependent data and services. In this way, third parties are able to provide implementations of most locale-sensitive classes in the java.text
and java.util
packages.
The implementation of SPIs (Service Provider Interface) is based on abstract classes and Java interfaces that are implemented by the service provider. At runtime the Java class loading mechanism is used to dynamically locate and load classes that implement the SPI.
You can use the locale-sensitive services SPI to provide the following locale sensitive implementations:
-
BreakIterator
objects -
Collator
objects -
Language code, Country code, and Variant name for the
Locale
class - Time Zone names
- Currency symbols
-
DateFormat
objects -
DateFormatSymbol
objects -
NumberFormat
objects -
DecimalFormatSymbols
objects
The corresponding SPIs are contained both in java.text.spi
and in java.util.spi
packages:
java.util.spi |
java.text.spi |
|
|
For example, if you would like to provide a NumberFormat
object for a new locale, you have to implement the java.text.spi.NumberFormatProvider
class. You need to extend this class and implement its methods:
-
getCurrencyInstance(Locale locale)
-
getIntegerInstance(Locale locale)
-
getNumberInstance(Locale locale)
-
getPercentInstance(Locale locale)
Locale loc = new Locale("da", "DK");
NumberFormat nf = NumberFormatProvider.getNumberInstance(loc);
These methods first check whether the Java runtime environment supports the requested locale; if so, they use that support. Otherwise, the methods call the getAvailableLocales()
methods of installed providers for the appropriate interface to find a provider that supports the requested locale.