Day 49: Well… just a little more language

« Day 47 and 48: Wrapping up Languages Day 50: The End »

🔧 Submitting fix for language subtags

This morning my goal was to clean up my hacky code to match based on all subtags (including private ones) in order to fix this bug about artificial languages not being sorted correctly. While reviewing the rfc for the millionth time, I realized that my algorithm that exactly matches subtags isn’t what the rfc says to do. The rfc does a truncating algorithm to find the best match.

From rfc4647 “Matching of Language Tags” section 3.4 “Lookup”:

In the lookup scheme, the language range is progressively truncated from the end until a matching language tag is located. Single letter or digit subtags (including both the letter ‘x’, which introduces private-use sequences, and the subtags that introduce extensions) are removed at the same time as their closest trailing subtag. For example, starting with the range “zh-Hant-CN-x-private1-private2” (Chinese, Traditional script, China, two private-use tags) the lookup progressively searches for content as shown below:

Example of a Lookup Fallback Pattern

Range to match: zh-Hant-CN-x-private1-private2

zh-Hant-CN-x-private1-private2

zh-Hant-CN-x-private1

zh-Hant-CN

zh-Hant

zh

(default)

So then I had to write a little more code, which broke things, which I fixed, etc etc. Then reviewed everything with a fine tooth comb to make sure the style, variable names, comments, etc match the go style (inevitably I will miss something obvious). This ended up taking all day, but I finally submitted a fix. In gerrit I added a lot of annotations to explain my changes.

What a fun day of coding for my penultimate sabbatical day.

🔮 What’s Next

I only have 1 more working day until the end of my sabbatical.

My goals for tomorrow are:

Write a blog post wrap up of my time in this experiment
Write a presentation to give to my company about how my sabbatical went

« Day 47 and 48: Wrapping up Languages Day 50: The End »