🍪 Mood: An expensed ginger molasses cookie on a crisp fall day.

🎵 Soundtrack: My Neighbor Totoro Soundtrack

🥖 💂‍♀️Language Matching

I spent this afternoon continuing to look into this x/text/language issue. The issue is basically:

  • The website supports [English, French]
  • When I say I speak [British English, French with a German local, Canadian French] in preference order, then it returns “English” as the best language.
  • But when I add another low preference language to the end of the list [British English, French with a German local, Canadian French, English with a German local] then it returns “French” as the best language.
  • Why does adding another language to the end of my preferences change which language is returned.

🐛 Like the issue author, I absolutely thought this was a bug when I first read it. But after digging into the code for two days I have determined that this is actually working as intended.

Here are the two assumptions about language matching that I learned:

1️⃣ Most people do not speak multiple languages equally.

The algorithm usually puts a heavy emphasis on the first exact language match and then “pins” that language. Docs about assumption 1.

m := language.NewMatcher([]language.Tag{language.English, language.French})

desired, _, _ := language.ParseAcceptLanguage("en-GB, fr")
tag, i, conf := m.Match(desired...)
fmt.Println("case A", tag, i, conf) // en-u-rg-gbzzzz
// Returns English because it is the first exact language match, even though
// "fr" is an exact region and language match.

2️⃣ If a person lists the dialects they know and the languages are not continuous, then it is assumed they speak all of those languages equally well and it will not give extra preference to the first exact language match it finds.

For example, if your list of desired language tags are: en-GB, fr-DE, fr-CA, en-DE then I will assume that you speak English and French very well. In this case, the algorithm will not “pin” English even though it is first. Instead it runs the whole algorithm for all of the options and it decides that fr-DE is a better match because of regionality (Germany is closer to France than Great Britain is to the US). Docs about assumption 2.

desired, _, _ = language.ParseAcceptLanguage("en-GB, fr-DE, en-DE")
tag, i, conf = m.Match(desired...)
fmt.Println("case B", tag, i, conf) // fr-u-rg-dezzzz

// Because of the ordering of dialects, the algorithm assumes this speaker
// knows English and French well. Returns French because Germany is closer to
// France then Great Britain is to the US.

🎉 I may not have made a code contribution, but at least I helped!