C or Python? Comparison of execution time for Mandelbrot image generation

Posted on February 3, 2016 by batchloaf

Because I’ve been generating a lot of fractal images recently, I’ve found myself spending quite a bit of time sitting around waiting for programs I’ve written to finish long repetitive calculations. In particular, I’ve generated several long sequences of several hundred images that I subsequently combined into animations.

One of the many great things about teaching electrical engineering is that I get to dip in and out of different programming languages all the time, but C remains my primary language and it’s still what I’m most likely to reach for when I’m writing something for my own curiosity. However, over the last couple of months I’ve been using Python for most of my fractal programming because I find its complex number syntax elegant. Furthermore, although complex numbers have been natively supported in C since the C99 standard, I’ve just never really embraced that language feature (until now).

Anyway, because I’ve been spending all this time waiting while numbers are getting crunched, I decided to do a quick comparison of execution time for the same task between Python and C. Naturally, since C is compiled and Python is interpreted, it comes as no surprise that C is significantly faster. Nevertheless, I was still surprised by how much faster it is.

The task I chose for this experiment was the generation of a Mandelbrot Set image of resolution 4000×4000 pixels, spanning a square region of the complex plane from -2 to +2 on the real axis and from -2j to +2j on the imaginary axis. This is the image produced:

I implemented the same algorithm in C and Python and then performed a rough measurement of the execution time of each program by running the “date” command line tool immediately before and after it, as shown below…

Screenshot_2016-02-02_23-56-08

As you can see, the approximate execution times were as follows:

Python implementation: 105 seconds
C implementation: 4 seconds

In other words, the C implementation executes more than 26 times faster than the Python implementation! Compiling the program with gcc (also shown in the above screen shot) only takes a fraction of a second too. I’m sure a better programmer could make either of these implementations run a bit faster, but unless the Python version can run dramatically faster than this then the difference is more than enough to persuade me to embrace complex maths in C.

This is the full C code:

//
// mandelbrot.c - written by Ted Burke - 2-Feb-2016
//
// To compile:
//      gcc mandelbrot.c -o mandelbrot -lm
//

#include <complex.h>
#include <stdio.h>

// Create an array of bytes to store pixel values
unsigned char p[4000][4000];

void main()
{
    double complex c, z;
    int w=4000, h=4000, x, y;
    unsigned char px;
    
    // Open output file
    FILE* f = fopen("mandelbrot_c.pgm","w");
    fprintf(f, "P5\n%d %d\n255\n", w, h);
    
    // Calculate pixel values
    for (y=0 ; y<h ; ++y)
    {
        for (x=0 ; x<w ; ++x)
        {
            // Initialise complex values c and z for next
            // iterative pixel value calculation
            c = (-2.0 + x*4.0/w) + (2.0 - y*4.0/h)*I;
            z = 0;
            
            // Iterate complex function while darkening pixel
            px = 255;
            while(px >= 10 && cabs(z) < 4.0)
            {
                z = z*z + c;
                px -= 10;
            }
            
            // Store latest pixel value in byte array
            p[y][x] = px;
        }
    }
    
    // Write the byte array to the output file
    fwrite(p, 1, w*h, f);
    fclose(f);
}

This is the full Python code:

#
# mandelbrot.py - written by Ted Burke - 2-Feb-2016
#

import numpy

# Create an array of bytes to store pixel values
w,h = 4000,4000
p = numpy.zeros((h,w),dtype=numpy.uint8)

# Open output file
f = file('mandelbrot_python.pgm','w')
f.write('P5\n{:d} {:d}\n255\n'.format(w, h))

# Calculate pixel values
for y in range(h):
    for x in range(w):
        # Initialise complex values c and z for next
        # iterative pixel value calculation
        c = (-2.0 + x*4.0/w) + (2.0 - y*4.0/h)*(0+1j)
        z = 0 + 0j
        # Iterate complex function while darkening pixel
        px = 255
        while px >= 10 and abs(z) < 4.0:
            z = z*z + c
            px -= 10
        # Store latest pixel value in byte array
        p[y][x] = px

# Write the byte array to the output file
f.write(bytearray(p))
f.close()

This entry was posted in Uncategorized. Bookmark the permalink.

8 Responses to C or Python? Comparison of execution time for Mandelbrot image generation

Thái says:

February 3, 2016 at 1:47 am

Amazing!
Never thought C would be that much faster compare to python.
Thanks for the post.

Reply
- batchloaf says:
  
  February 3, 2016 at 10:41 am
  
  Yes, I was really surprised too!
  
  Ted
  
  Reply
fduignan says:

February 3, 2016 at 7:05 am

How about using optimisation with the compiler? Did you run the python code more than once? – python compiles to byte code the first time a program is run

Reply
- batchloaf says:
  
  February 3, 2016 at 11:31 am
  
  Hi Frank,
  
  Interesting points! I actually hadn’t tried changing the default optimization, but I’ve just done a little test and it is quite significant in this case. Basically, I tried compiling the C code with each of the following: “-O0” (the default, I think), “-O1”, “-O2” and “-O3” to produce executables “mandelbrot0”, “mandelbrot1”, “mandelbrot2” and “mandelbrot3” respectively. As you can see in the screen shot below, “mandelbrot2” and “mandelbrot3” turned out to be completely identical. I then used the “time” command to measure and compare the execution times of versions 0, 1 and 2 respectively.
  
  I don’t really have any experience using optimization in gcc and I’m still trying to get my head around what each optimization level actually enables, but I was surprised to observe that the execution time for “mandelbrot2” was actually longer that that for “mandelbrot1”. In case this was just due to measurement variability, I tried running “time” a few times on each version to verify that “mandelbrot1” is consistently faster (which it seems to be).
  
  Regarding the Python bytecode, I did wonder about that myself and I have a vague recollection of seeing .pyc file knocking around after running python scripts previously. However, none of those files actually appeared in the folder in this case and the execution time doesn’t seem to change significantly from one execution to the next. Maybe the .pyc files are only produced when you import a module into a Python program or something like that? In any case, for a heavy number crunching task like this I would expect the cost of compiling to bytecode to be less significant in the greater scheme of things because it would presumably happen at the beginning before the time-consuming calculation commences. That’s speculation on my part, but the fact that execution time seems to scale with image width and height (and hence the number of pixels) lends weight to that argument.
  
  At any rate, I did a short test by making an edit to the Python code and then running it three times (without making any further edits). As you can see below, for whatever reason it actually ended up taking a few seconds longer on the second trial.
  
  Anyway, there wasn’t much change after the first execution and of course all three times were more than 20 times slower than the compiled C version.
  
  Going back to the gcc optimization, do you have any idea why I’m getting a faster version with “-O1” than I do with “-O2” (or “-O3”)? I had been thinking simplistically of the numbered optimization levels as “higher number” equals “more optimized”, but reading the gcc man page reminded me that of course you can optimize for very different things: executable size, memory size, execution speed, etc. To execute a number-crunching program like this as fast as possible, what would you suggest in terms of optimization options? In this case, the compile time, memory footprint and executable size don’t really matter at all (within reason).
  
  Ted
  
  Reply
James says:

April 29, 2016 at 3:16 pm

Hey Ted,

Using the ‘pypy’ JIT compiler can offer a significant speed-up compared to the standard cpython interpreter. For example, on my Celeron B830, the C program (gcc -O3) has a runtime of approximately 3.2~3.3 seconds. The cpython interpreter takes 104~113 seconds. C is about 32-ish times than Python, the same results as you.

However, using ‘pypy’ gave me runtimes of 7.7~7.8 seconds, only 2x slower than C. I find this to be incredible.

If you’d like to explore this further, I recommend installing ‘pypy’, and ‘numpypy’, and seeing this for yourself. (I followed the ‘virtualenv’ steps on the numpypy bitbucket page and used git to check out the older branch, so it would work with the latest-yet-outdated ‘pypy’ version included in Xubuntu)

(Also worth checking out is the Julia language – there’s even a mandelbrot benchmark on http://julialang.org/ , and the torch7 framework for Lua – these languages have incredible JIT compilers and are used in the industry by those needing the dynamics of something like Python, but the speed of C)

James

Reply
- batchloaf says:
  
  May 3, 2016 at 11:37 pm
  
  Thanks for the tip, James! I had no idea the speed increase with pypy would be that dramatic. As it happens, I’ve been burying my head in C again for the last month or so of fractal generating, but I’ll have to keep pypy in mind for future Python computations.
  
  By coincidence (or is it!), my colleague Ruairi de Frein was recommending Julia to me the other day and I looked it up to learn a bit about it. I haven’t had a chance to actually program anything in it yet though, so I probably won’t really get it until I do.
  
  Anyway, thanks again.
  
  Ted
  
  Reply
adammaj1 says:

January 1, 2018 at 7:30 pm

http://benchmarksgame.alioth.debian.org/

Reply
anonymous says:

January 9, 2023 at 2:12 pm

I found this website to be very useful, thank you for your hard work!
My website has simple, cool Python projects. If you are interested,: visit my website.

Reply

	batchloaf on Simple DFT in C
	Joseph Makambala on Simple DFT in C
	Sanel Lerić on Using a dsPIC30F4011 to genera…
	Jim on How to display USB webcam as l…
	Mark on Simple communication with a TC…