Processing a long list efficiently in Python

This post was inspired by Brandon Rhodes‘ excellent talk at Pycon 2014,  All Your Ducks In A Row: Data Structures in the Standard Library and Beyond.

Python’s lists are dynamic.  You do not need to declare their length ahead of time, and you can add to them at any time. How does Python manage memory for a list when it doesn’t know how long your list is going to be? It allocates a little extra room at the end of the list, in case you end up adding some more items. Most of the time this makes your list operations fast, but it can catch you off-guard if you don’t think through some operations carefully.

Consider an example where you want to work through a long list, popping items until the list is empty:

numbers = [x for x in range(0, 100000)]

while len(numbers) > 0:
    number = numbers.pop()
    print number

This code makes a simple list of 100,000 numbers. It then removes the numbers one at a time from the end of the list, and prints each number as it is removed. When I time this program on my computer, it takes roughly 1.3 seconds.

Now let’s try the exact same program, except each item will be popped from the beginning of the list instead of the end of the list:

numbers = [x for x in range(0, 100000)]

while len(numbers) > 0:
    number = numbers.pop(0)
    print number

This program takes twice as long! The difference is even more significant if you use a longer list.

Why does it take so long? Python is constantly moving all of the items in the list, since lists are more optimized for adjustments at the end of the list than they are at the beginning.

An efficient solution

How can you do this more efficiently? Since the goal is to work through each item from the beginning of the list, you can reverse the list before popping its items. Then you can pop from the end of the list, which is efficient, and still work through items in the order you intended to:

numbers = [x for x in range(0, 100000)]
numbers.reverse()

while len(numbers) > 0:
    number = numbers.pop()
    print number

The reverse() operation takes a negligible amount of time to run, so this version takes rougly 1.3 seconds again.

The takeaway? If you are looking at performance issues, understanding how Python manages memory can be really helpful in making optimizations.

About these ads

About ehmatthes

Teacher, hacker, new dad, outdoor guy
This entry was posted in programming and tagged , , , . Bookmark the permalink.

3 Responses to Processing a long list efficiently in Python

  1. Anonymous says:

    You can also use a deque and have your long list be a generator statement.

  2. Nice post!

    A thought:
    reverse() also take o(n) time to run. It is optimized and no doubt better than rewriting a function which does reverse (as you do in your example), but if you want to continually be popping from the beginning, then deque is great.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s