Interview test: Analysis taster

Previously:


Deliberately excludes some outliers
Red indicates failed test
It's the weekend and I hope to complete my analysis of the more than 100 routines people have sent. As a quick taster, here is a diagram I've generated that attempts to show which routines are similar to which other routines.

The ones in red failed the tests (possibly for completely trivial reasons that would be excused - possibly not!)

The idea is to get a similarity or distance measurement, and then to layout routines so that similar routines are close to each other.

The diagram has been generated by connecting the two closest routines, then finding the next closest pair without a connecting path and joining them, and so on. That guarantees a tree, but we may get two routines that are actually quite close, but on the diagram get separated by a long way. For example, suppose A--B--C--D--E--F with all distances being 1. It's nonetheless possible that d(A,F)=2, but A and F will end up a long way apart.

Anyway, I'm using this to find clusters of similar routines, and then finding the feature set(s).

Interesting.

More later.


UPDATE:

Here's the next phase of the analysis