πŸ‘– Mood: Feeling fly in the pants I made this weekend.

🎡 Soundtrack: Bossa Nova Playlist

πŸ“š Reading Github Issues

I started today by reading 13 new issues – not including the two that I wrote yesterday.

🌎 Language Matching Bug Update

Yesterday I submitted an issue and a fix for a broken Example in the text/language package.

Today I was greeted by a very kind and thorough code review!

Screenshot of the gerrit code review with comments from reviewer

One of the comments was:

It would also be interesting to understand why this no longer matches Croatian,
I see it prints "en 0 no". Was the example wrong in the first place?  Or it's a
regression in Match's behaviour?

I dug into this for half the day before I came up with the following response:

tl;dr - I don't think it is a bug.

Aug 24 09:29:01 2017 - "language: remove manual hr -> sr mapping"
 * https://go-review.googlesource.com/c/text/+/55331/
 * The ExampleMatcher function does not have the problematic "todo" in it
 * You would think that this is the commit that changed the functionality, but
   no.
 * The example was testing sr -> hr. This commit only removed the manual hr ->
   sr.
 * With this commit the code matches Serbian to Croatian with "high" confidence.

Aug 24 09:44:20 2017 - "language: improve compliance of matcher"
 * https://go-review.googlesource.com/c/text/+/55750
 * This commit "remove[s] region distance".
 * This commit adds the problematic "todo" to ExampleMatcher.
 * With this commit the code matches Serbian to Croatian with "no" confidence.

The current CLDR Language Matching chart v39.0
(https://unicode-org.github.io/cldr-staging/charts/latest/supplemental/language_matching.html)
indicates that Serbian and Croatian are no longer matches for each other. This
seems in line with the first commit I referenced here removing hr -> sr for
geopolitical reasons.

Conclusion: I think Serbian should NOT match Croatian.

🐞 Example Test Bug Update

With the other half of my day I looked more into this issue that I also opened yesterday. One of the commenters suggested a fix:

@ameowlia I agree that this is behavior that is subtle and hard to catch. What
if the testing tool's heuristic was changed to only process CommentGroups that
begin with Output:? It would be able to ignore that TODO comment, and the output
comment could be placed anywhere in the function, without affecting the test
result.

Thanks to all of my digging yeserday, I actually understand exactly what Commentgroups are! It is fun to see my knowledge building.

I wanted to understand what it would take to make this come true. I poked around in the testing pkg where everything is run, but you can see here that the examples and the output is already parsed by that point.

Turns out they aren’t parsed in testing pkg. they are parsed in docs, very close to where I was looking yesterday. Here is where it only looks at the last comment group and it sees if that comment has an output. If the output is not in the last block, then that test is considered to have no output and there is nothing to match against.

I’m not convinced that allowing output anywhere is a good solution. What happens if there are two blocks with β€œoutput”? Then one will be ignored? This could result in a very similar bug.