Sieve Of Eratosthenes In Python

   
Recent changes
Table of contents
Links to this page
FRONT PAGE / INDEX

Subscribe!
@ColinTheMathmo

My latest posts can be found here:
Previous blog posts:
Additionally, some earlier writings:

The Sieve of Eratosthenes
in Python - 2015/05/24

http://upload.wikimedia.org/wikipedia/commons/b/b9/Sieve_of_Eratosthenes_animation.gif
Sieve of Eratosthenes animation
Source: Wikipedia[0]
One of the things we need to do when finding Perrin Pseudo-Primes is to recognise prime numbers so we can see if the numbers predicted by the Perrin test to be prime, are. So we need to generate primes. For small primes (for some definition of "small") this can be done quickly and efficiently by using the Sieve of Eratosthenes. In our case, instead of using a bit map of flags, we will use a dynamically generated collection of filters, one for each prime, and run down the list of all numbers, filtering as we go.

The basic algorithm

So how does the Sieve of Eratosthenes work? We start by writing a flag for every number from 2 up to the limit we're interesting in.

We set these flags as follows:

  • True $\rightarrow$ Prime as far as we know
  • False $\rightarrow$ Definitely Composite.

We initialise all the flags to True.

 
#!/usr/bin/python

limit = 121

flags = {}
for i in range(2,limit+1):
    flags[i] = True

p = 2
while p <= limit:

    if flags[p] == True:
        print p
        m = p*p
        while m <= limit:
            flags[m] = False
            m += p

    p += 1
Now we look for the first number not known to be composite. In other words, we look for the first flag that's True. To start with we know this will be 2 - there's no mystery there. This is definitely prime, so we call it $p,$ and print it. Then we start at $p^2$ and move forward in steps of size $p,$ setting flags to False. The point is that we are stepping through the multiples of $p,$ which are therefore definitely composite.

Now we move to the next number and go again. If its flag is True then the number is prime, so we print it, and step forward setting the flags for its multiples.

At each stage we know that every composite number up to $p^2$ has definitely been marked as such. So the next unmarked number is always going to be prime, so we call that $p,$ print it, mark all its multiples, and so on.

An important feature of this algorithm is that it doesn't need us to do any division. Until comparatively recently routines called the "Sieve of Eratosthenes" actually used trial division, and so weren't the original algorithm at all. We can see here though that division is not necessary.

The difficulty, however, is that we need to start by initialising a flag for every number up to the limit we're interesting in. If we don't know how far we're going to go, or if we're limited on space, this is a problem. However, we can overcome both these problems by using a more dynamic version.

A more "Dynamic" version

 
#!/usr/bin/python

from collections import defaultdict

# "cmps" is our dictionary
# of known composites

def create_list(): return []

cmps = defaultdict(create_list)

def generate_primes():

  v = 2
  while True:
    yield v
    cmps[v*v].append(v)
    v += 1
    while v in cmps:
      for f in cmps[v]:
        cmps[v+f].append(f)
      del cmps[v]
      v += 1
The essence is that having decided a number $p$ is prime, we print it (or in this case, return or "yield" it) and suspend the flagging until we need it. We remember that $p^2$ is a composite and has $p$ as a factor, but we don't run forward from there until we need to. So we maintain a "dictionary of composites" which is indexed by the numbers we know to be composite. Each entry in the dictionary holds a list of known factors for the number. Thus in the entry for $v^2$ we store $v.$ Then we increment $v.$

When we look at the next candidate, $v,$ we see if it is in our dictionary of composites. If so then we do not return it, because we know it's not prime. Instead, we look at all the factors $f$ we know of it, and for each we insert into our dictionary that $v+f$ has $f$ as a factor. Having done so we can delete $v$ from our dictionary, thus saving space (and time for subsequent lookups).

We continue until we find a number that is not in our dictionary of composites, and realise that it must be prime. Thus we return it.

Treating 2 as a special case

 
def generate_primes():

  yield 2
  v = 3
  while True:
    yield v
    cmps[v*v].append(v)
    v += 2
    while v in cmps:
      for f in cmps[v]:
        cmps[v+f+f].append(f)
      del cmps[v]
      v += 2
We can improve this routine by treating "2" as a special case. The changes are small. We yield "2" at the beginning, and then set our next number to examine as "3". We proceed as before, except that when we step forward we do so by 2, and when we set the flags we don't step forward by the factor, but by twice the factor. This small improvement doubles the speed, and that's definitely worth having.

We could go further and also treat "3" as a special case - we can save that for later in case we need it. It's definitely more convoluted to do so, and may not be worth the effort.

 
n = 2

for p in generate_primes():

    while n < p:

        if is_perrin_q(n):
            print n
        n += 1

    if clock() >= 100:
        break
    n += 1
This code works remarkably well in our context. We can now re-write our main routine as follows. We set $n$ to be the number we are considering, and then loop over all primes. For as long as $n$ is less than $p$ we test to see if $n$ passes the Perrin test. If it does then it's a Pseudo-Prime and we print it.

When $n$ gets to equal $p$ we don't need to perform a test, because we know that it will pass (because the Perrin test is always true for prime numbers). So we step to the next potential number, and go round the loop with the next prime.

Timing

So, timing?

Previous
Now
n
6594026
12825557
Prime
22.61 s
5.59 s
Perrin
66.93 s
88.38 s

As previously, we're running our Perrin search for 100 seconds and looking at how much time is spent checking primes versus checking the Perrin numbers. Let's compare with the previous results.

As you can see we've got nearly twice as far, and now we're spending only about 6% of our time in the prime generation routine. So once again it's the Perrin testing that's the slowest part, so once again we ask, how can we make that faster.

That's for next time.


Some references


<<<< Prev <<<<
Fast Perrin Test
:
>>>> Next >>>>
The Point Of The Banach Tarski Theorem ...


You should follow me on twitter @ColinTheMathmo

Comments

I've decided no longer to include comments directly via the Disqus (or any other) system. Instead, I'd be more than delighted to get emails from people who wish to make comments or engage in discussion. Comments will then be integrated into the page as and when they are appropriate.

If the number of emails/comments gets too large to handle then I might return to a semi-automated system. That's looking increasingly unlikely.


Contents

 

Links on this page

 
Site hosted by Colin and Rachel Wright:
  • Maths, Design, Juggling, Computing,
  • Embroidery, Proof-reading,
  • and other clever stuff.

Suggest a change ( <-- What does this mean?) / Send me email
Front Page / All pages by date / Site overview / Top of page

Universally Browser Friendly     Quotation from
Tim Berners-Lee
    Valid HTML 3.2!