Day 49: Well... just a little more language
🔧 Submitting fix for language subtags
This morning my goal was to clean up my hacky code to match based on all subtags (including private ones) in order to fix this bug about artificial languages not being sorted correctly. While reviewing the rfc for the millionth time, I realized that my algorithm that exactly matches subtags isn’t what the rfc says to do. The rfc does a truncating algorithm to find the best match.
From rfc4647 “Matching of Language Tags” section 3.4 “Lookup”:
In the lookup scheme, the language range is progressively truncated from the end until a matching language tag is located. Single letter or digit subtags (including both the letter ‘x’, which introduces private-use sequences, and the subtags that introduce extensions) are removed at the same time as their closest trailing subtag. For example, starting with the range “zh-Hant-CN-x-private1-private2” (Chinese, Traditional script, China, two private-use tags) the lookup progressively searches for content as shown below:
Example of a Lookup Fallback Pattern
Range to match: zh-Hant-CN-x-private1-private2
- zh-Hant-CN-x-private1-private2
- zh-Hant-CN-x-private1
- zh-Hant-CN
- zh-Hant
- zh
- (default)
So then I had to write a little more code, which broke things, which I fixed, etc etc. Then reviewed everything with a fine tooth comb to make sure the style, variable names, comments, etc match the go style (inevitably I will miss something obvious). This ended up taking all day, but I finally submitted a fix. In gerrit I added a lot of annotations to explain my changes.
What a fun day of coding for my penultimate sabbatical day.
🔮 What’s Next
I only have 1 more working day until the end of my sabbatical.
My goals for tomorrow are:
- Write a blog post wrap up of my time in this experiment
- Write a presentation to give to my company about how my sabbatical went