THE STRATEGY IN A NUTSHELL
1. We are supposed to indicate the quality (0 to 1) of each of our languages offered.2. The browser is supposed to tell us the user's language preferences with strengths (0 to 1) of each, and the browser can use a "*" to specify a preference strength for all other languages. Any language not specified gets a preference strength of 0.
3. For each of our offered languages, we search the visitor's preferences for the right strength to multiply by our language quality. We assign to our offered language the strength of their longest matching preference.
4. Our offered language with the highest composite quality times assigned strength wins, and we show it to the visitor.
HOW WE FIND THE MATCH FOR EACH OF OUR OFFERINGS
Each of the languages we offer has an ISO name like en, en-gb, es-ar, etc. For each of our languages, we consider the ISO name and the two-letter base portion (like en, en, es, etc). Then for each of their language preferences, we determine if it is a match to either our ISO name or our base, and if it's a match, we record it. The strength of the longest matching preference gets assigned to our offering as in the following examples:Our offering: en-us;1.0
Their preferences: en-gb;1.0,es;0.5,fr;0.3
Strength assigned: 0 None of their preferences matches our offering or its base, so we must assign 0.
Note that this seems wrong. But the HTTP spec (see Note 1 below) says it's the browser's responsibility to set or suggest the base language (en) when a specific language variant (en-gb) is selected by the user.
Our offering: en-us;1.0
Their preferences: en-gb;1.0,en;0.9,en-us;0.8,es;0.5,fr;0.3
Strength assigned: 0.8 Their en and en-us preferences match our offering, and their en-us is the longest match.
Our offering: en;1.0
Their preferences: en-gb;1.0,es;0.5,fr;0.3
Strength assigned: 0 None of their preferences matches our offering or its base, so we must assign 0. Note that this seems wrong. But the HTTP spec (see Note 1 below) says it's the browser's responsibility to set or suggest the base language (en) when a specific language variant (en-gb) is selected by the user.
Our offering: en-us;1.0
Their preferences: en;1.0,es;0.5,fr;0.3
Strength assigned: 1 Their en preference matches our base, so our en-us gets a strength of 1.0.
Our offering: en-us;1.0
Their preferences: en-gb;1.0,es;0.5,fr;0.3,*;0.5
Strength assigned: 0.5 Their * preference matches our en-us.
Our offering: fr;0.8
Their preferences: en-gb;1.0,es;0.5,fr;0.3,*;0.5
Strength assigned: 0.3 Their fr preference matches our fr.
We then multiply the strength of the match by our quality. Whichever language we offer has the highest composite quality wins. It is the language we serve.
SECOND-GUESSING THE SPEC
If we absolutely refuse to trust that browsers will assist users to give us the preferences they really want, we can do something like let the base of their preference match the base of our offering, perhaps with a strength reduction in recognition of the fact we are second-guessing the spec.TERMS USED IN THE SPEC
The following table shows how the terminology in this post compares with the HTTP spec terminology:THIS POST | HTTP SPEC | EXAMPLE | MEANING |
(language) preference | range | en-us, es | An ISO language abbreviation (en-us, es, etc.) for a visitor's language preference. |
(language) offering | tag | en, es-ar | An ISO language abbreviation (en, es-ar, etc.) for a language we offer. |
base | prefix | en, es | The part of a language abbreviation that comes before the "-". In ISO practice, the first two letters of a language abbreviation. |
match | match | no (en-us not equal to en), yes (es equal to base of es-ar) | A preference matches an offering if it (in its entirety) equals the offering or the base of the offering. |
strength | quality | 0.5, 1.0 (en-us 0.5 and es 1.0) | An indication of the strength of a visitor's language preference. |
quality | quality | 1.0, 0.8 (en 100% and es-ar 80%) | An indication of the relative quality of the offered language at our web site. |
assigned | assigned | 0, 1.0 (Yes, their en-us is not supposed to match our en. See note 1 below.) | The strength assigned to each of our offerings. |
en | en | match | Yes. Assign their quality. |
es | es-ar | match | Yes. Assign their quality. |
Note 1 (quoted from the HTTP Spec):
Note: When making the choice of linguistic preference available to the user, we remind implementors of the fact that users are not familiar with the details of language matching as described above, and should provide appropriate guidance. As an example, users might assume that on selecting "en-gb", they will be served any kind of English document if British English is not available. A user agent might suggest in such a case to add "en" to get the best matching behavior.
No comments:
Post a Comment
Spammers, don't waste your time. If you need honorable work, let's brainstorm. Do you see any spam comments here? No. That's because you won't get through. I personally review every comment.
Real folks, thanks for commenting!