This post was inspired by Brandon Rhodes‘ excellent talk at Pycon 2014, All Your Ducks In A Row: Data Structures in the Standard Library and Beyond.
Python’s lists are dynamic. You do not need to declare their length ahead of time, and you can add to them at any time. How does Python manage memory for a list when it doesn’t know how long your list is going to be? It allocates a little extra room at the end of the list, in case you end up adding some more items. Most of the time this makes your list operations fast, but it can catch you off-guard if you don’t think through some operations carefully.
Consider an example where you want to work through a long list, popping items until the list is empty:
numbers = [x for x in range(0, 100000)] while len(numbers) > 0: number = numbers.pop() print number
This code makes a simple list of 100,000 numbers. It then removes the numbers one at a time from the end of the list, and prints each number as it is removed. When I time this program on my computer, it takes roughly 1.3 seconds.
Now let’s try the exact same program, except each item will be popped from the beginning of the list instead of the end of the list:
numbers = [x for x in range(0, 100000)] while len(numbers) > 0: number = numbers.pop(0) print number
This program takes twice as long! The difference is even more significant if you use a longer list.
Why does it take so long? Python is constantly moving all of the items in the list, since lists are more optimized for adjustments at the end of the list than they are at the beginning.
An efficient solution
How can you do this more efficiently? Since the goal is to work through each item from the beginning of the list, you can reverse the list before popping its items. Then you can pop from the end of the list, which is efficient, and still work through items in the order you intended to:
numbers = [x for x in range(0, 100000)] numbers.reverse() while len(numbers) > 0: number = numbers.pop() print number
The reverse() operation takes a negligible amount of time to run, so this version takes rougly 1.3 seconds again.
The takeaway? If you are looking at performance issues, understanding how Python manages memory can be really helpful in making optimizations.
You can also use a deque and have your long list be a generator statement.
Nice post!
A thought:
reverse() also take o(n) time to run. It is optimized and no doubt better than rewriting a function which does reverse (as you do in your example), but if you want to continually be popping from the beginning, then deque is great.
PS. For a great example of this to use in class — compare deque to a normal list when implementing merge-sort!