Waiting In Vain

Recent changes

Table of contents

Links to this page

FRONT PAGE / INDEX

Subscribe!

My latest posts can be found here:

Colins Blog

Previous blog posts:

Additionally, some earlier writings:

This page has been
Tagged As Software

Over the past few years I've occasionally been contacted by my ISP[0] to say that one of my scripts had gone rogue and was using 100% CPU, and would it be OK to kill it. This has always been a surprise to me because I'm usually pretty careful about my code, especially when it's running "out there" on someone else's hardware.

The answer was always "Of course! Kill it!" but the problem then was that I never knew where it had got stuck, and what it thought it was doing.

The problem was compounded by several factors:

Firstly, I don't have shell access to the machines, so any debugging would have to be done effectively blind;
Secondly, I was having trouble provoking the scripts into giving me a core dump, so when they were killed I got no information at all;
Thirdly, I don't usually have a lot of time to spare, so I'd have to do this efficiently; and
Fourthly, the hangs were sporadic, and I couldn't provoke them, so I'd have to make changes to the code and then wait for it to go wrong again at an unknowable time.

So that was the context. What to do?

The first thing to do was to identify the program, but that was easy. I wrote a web script which, when called, would run "top" and "ps" and provide the outputs. I set them up to be run every hour from my home machine, and thus I could see for myself when a script went 100% CPU.

Then I waited.

Finally I captured the output, and found that it was one of my wiki edit scripts. OK, perhaps not surprising, especially since it was trying to edit a non-existent page. But clearly someone was trying to hack into a non-existent account, and somehow it was provoking the script into behaviour I hadn't anticipated.

: Time used : 01:34:38 : Executable: XXXX_edit : Parameter : DOMAINxmlrpc.php

First thing to note is that the parameter is in no way sensible, so this is someone trying to hack in. OK, we expect that on a public facing website.

But now we know which binary to instrument. I insert some debugging and logging, then left it to run. It took 24 hours, but it went rogue again, and I had some information. Problem was, it was hanging somewhere where there were apparently no loops.

The code here is not intended to be a good example, and is not, in truth, the code that's actually running. If you want to comment on the code then by all means, but you need to know that your comments, while no doubt well-intentioned, might be misplaced.
More instrumentation, more waiting, lather, rinse, repeat, and finally I got to one specific place in the code. Here it is, lightly edited for clarity and to remove irrelevant sections, and with some functions in-lined to get everything in one place.

 
:def get_post_data():
:
:    try: data_len = int(os.environ['CONTENT_LENGTH'])
:    except: data_len = 0
:
:    data = ''
:    while len(data) < data_len:
:        data += stdin.read(data_len-len(data))
:
:    return data

So we ask the environment how much data there is to read, and then repeatedly ask to read data until we have how much there was promised to us.

The problem is, as I found, sometimes we are told that there is a non-zero amount to be read, but when you try to read it, there is nothing there. More details are available on request, but that's the nub of it.

Again, this isn't the actual code, but it shows the new method of reading stdin.
So here's the new version. Now I have three attempts at reading from stdin and if none of them work, the code gives up. So the main loop runs for as long as it's making progress, and gives up if it looks like it's getting stuck.

 
:def get_post_data():
:
:    try: data_len = int(os.environ['CONTENT_LENGTH'])
:    except: data_len = 0
:
:    data = ''
:    while len(data) < data_len:
:        new_data = ''
:        for n in [1,2,3]:
:            new_data = stdin.read(data_len-len(data))
:            if new_data != '': break
:        if new_data == '': break
:        data = data + new_data
:
:    if data_len > 0:
:
:        write_log( 'Data: len=%d, read=%d\n' % (data_len,len(data)) )
:        if len(data) > 0:
:            write_log( 'Data: "%s"\n' % `data` )

More details on the attack
and the context ...

Let's put this in context and explain more about the attack. Much of this I've deduced, so it might be wrong. I have no experience of WordPress, so it's very likely I've got the details wrong - feel free to send me a comment to let me know.

However ...

There's a website where I run a wiki. Let's call it WEBSITE.org.uk to avoid getting into specifics. Since it's a wiki, each page has a link you can click on to take you to an edit page. The link is made up of the website URL and the name of the script, which I'll call wiki.

To read a page on the wiki you invoke the website URL and the wiki script name, and you give it the desired pagename as a parameter:

https ://WEBSITE.org.uk/wiki?pagename

If you then want to edit a page, or you try to invoke a pagename that doesn't exist, you get sent to the editing script:

https ://WEBSITE.org.uk/wiki_edit?pagename

So far, so good.

Now let's suppose there is a nefarious actor out there who chooses to launch a dictionary attack. In essence, this consists of finding a web page that requires a login, and then attempting to log in using a range of possible usernames and passwords. Attacks like this are depressingly common and even more depressingly successful - there is a worryingly large number of sites with username ADMIN and password one of password, admin, or 12345.

So our nefarious actor writes a script to wander over the web, find websites, and try to login. For simplicity, they decide to focus on WordPress sites. So they wander about trying to find WordPress sites, and then trying a range of usernames and passwords to login.

But in truth, doing this costs effectively nothing, so why would our nefarious actor bother to write code that tests whether or not a site is actually a WordPress site - just try to login anyway! It doesn't cost anything, and saves the programming effort, as well as the time it would take to fetch the page and examine it to see if it is, or is not, a WordPress site. Just try to login.

So our nefarious actor has written a script to find pages on the web, and try to login on the assumption that it's a WordPress site.

: pwd : wp-submit : log

To login to a WordPress site you need three parameters, given here at right.

So what's happening? The nefarious actor's script is trying to login, and so is invoking a URL that, to them, looks like a WordPress login page. In this case they are trying to access:

WEBSITEwp-login.php

But the URL is not a WordPress script, it's my wiki script, so the script interprets this as an attempt to read the page with that name.

But that page doesn't exist, so the "edit" script is invoked.

It's passes what the nefarious actor thinks are the necessary parameters for logging on to a WordPress site, but they don't know the username or passord to use. And that's where the "dictionary attack" part comes into it. They just try lots of things.

 
pwd=WEBSITE.org.uk1
wp-submit=Log+In
log=WEBSITE.org

 
pwd=WEBSITE.org.uk1234
wp-submit=Log+In
log=WEBSITE.org

 
pwd=WEBSITE.org.uktest
wp-submit=Log+In
log=WEBSITE.org

Here's an actual attempt from the log file:

 
: =====================
: Data: len=62, read=62
: Data: "'pwd=WEBSITE.org.uk1&amp;wp-submit=Log%2BIn&amp;log=WEBSITE.org'"
: Data: len=62, read=0
: =====================

As you can see, the passwords being tried are simply variants on the website name.

And this is common:

Many people when choosing
a password choose a simple
variant of the site's name.

You can see why. If you have dozens of passwords to remember, you need a scheme to help you, and what better than simply taking the site's name and warping it somehow.

But hackers know that, and they've collected hundreds of rules to take site names, people's names, words from the dictionary, and warp them to create possible passwords, and here is the evidence that they do this on a grand scale. This site wasn't attacked because it was valuable, or prominent.

It was attacked because it was on the web.

So what lessons can we learn?

One is that we can't trust promises from the system It might tell you that there's data to read, and then not actually have any data for you. In fact, the real lesson here might be not to read from the system directly in this context, but instead to use the excellent libraries provided that bundle up the data and give it to you in a suitable format. That may be true, but there is more background to this code than I've explained here, and I'll let that slide.

But the main lesson is that you really, really need to have strong passwords everywhere.