Unexpected Interaction Of Features

   
Recent changes
Table of contents
Links to this page
FRONT PAGE / INDEX

Subscribe!
@ColinTheMathmo

My latest posts can be found here:
Previous blog posts:
Additionally, some earlier writings:

An Unexpected Interaction of Features

I've been dealing with some data, and using my usual technique of using command-line tools to play with it for a while before writing a program to do the full analysis.

But something was wrong, and it took me a while to work it out.

I was sorting a file:

which
aerodynamically
electroencephalogram
exotically
aerodynamically
a
differentiation
-> a
aerodynamically
aerodynamically
differentiation
electroencephalogram
exotically
which

But my file has as the first field a count:

5 which
15 aerodynamically
20 electroencephalogram
10 exotically
15 aerodynamically
1 a
15 differentiation
-> 10 exotically
15 aerodynamically
15 aerodynamically
15 differentiation
1 a
20 electroencephalogram
5 which

That's not what I wanted, but this was a game I'd played before. The utility sort is working on the data as text, so it's alphabetical. I need to sort using -n to get it to sort numerically:

5 which
15 aerodynamically
20 electroencephalogram
10 exotically
15 aerodynamically
1 a
15 differentiation
-> 1 a
5 which
10 exotically
15 aerodynamically
15 aerodynamically
15 differentiation
20 electroencephalogram

Excellent, but now I realise there are repeated lines, and I need to de-duplicate. So I use sort -u to do that:

5 which
15 aerodynamically
20 electroencephalogram
10 exotically
15 aerodynamically
1 a
15 differentiation
-> 10 exotically
15 aerodynamically
15 differentiation
1 a
20 electroencephalogram
5 which

The duplication is gone, but the screwy ordering is back, because I forgot the "numerical" flag, so sort -nu is what I need:

5 which
15 aerodynamically
20 electroencephalogram
10 exotically
15 aerodynamically
1 a
15 differentiation
-> 1 a
5 which
10 exotically
15 aerodynamically
20 electroencephalogram

Spot the difference.

Yes, the "differentiation" line has gone, and I can only assume that when both the n and u flags are set, it only takes the numbers into account when deciding if there are duplicates. I haven't explored whether, for a given number, it (a) sorts and keeps the first, (b) sorts and keeps the last, (c) keeps the first in the input then sorts, (d) keeps the last in the input then sorts, or (e) something else.

But it's certainly not what I expected.

So now it's back to using "sort -n | uniq" rather than "sort -nu".

For reference: "sort --version" returns "sort (GNU coreutils) 8.21"


<<<< Prev <<<<
Archimedes Hat Box Theorem
:
>>>> Next >>>>
Why Top Posting Has Won ...


https://mathstodon.xyz/@ColinTheMathmo You can follow me on Mathstodon.



Of course, you can also
follow me on twitter:

@ColinTheMathmo


Send us a comment ...

You can send us a message here. It doesn't get published, it just sends us an email, and is an easy way to ask any questions, or make any comments, without having to send a separate email. So just fill in the boxes and then

Your name :
Email :
Message :


Contents

 

Links on this page

 
Site hosted by Colin and Rachel Wright:
  • Maths, Design, Juggling, Computing,
  • Embroidery, Proof-reading,
  • and other clever stuff.

Suggest a change ( <-- What does this mean?) / Send me email
Front Page / All pages by date / Site overview / Top of page

Universally Browser Friendly     Quotation from
Tim Berners-Lee
    Valid HTML 3.2!