& : Python extend() without a list comprehension

Comprehensions are a common replacement for loops when we have to populate a Python container. The language is organized so that large lists, dictionaries, and tuples can be constructed more efficiently — and so more quickly — with a comprehension than with a corresponding loop over an append or add operation.

For a loop such as

the_list = []
for item in container:
    the_list.append(object)

we write a comprehension as

the_list = [object for item in container]

Aside: More generally, a group of nested for statements in a nested loop undergoes only minimal change in order when replaced with a comprehension; we replace

the_list = []
for item in container: # outermost for statement
    for subitem in item: # next outermost for statement
        for subsubitem in subitem: # next outermost
            # ...
                the_list.append(object)

with

the_list = [object for item in container for subitem
        in item for subsubitem in item ...]

The improvement in speed is considerable (Python 3.3):

$ python -m timeit -s """
from random import random as R
list = []""" """
for item in range(10000):
    list.append(R())"""
1000 loops, best of 3: 1.08 msec per loop
$
$ python -m timeit -s """
from random import random as R""" """
[item for item in range(10000)]"""
1000 loops, best of 3: 334 usec per loop
$

(Note the primitive use of "u" for μ in "usec" (μsec). But the savings becomes smaller for deeply nested loops.)

Similarly, for a set:

the_set = set()
for item in container:
    the_set.add(object)

we write

the_set = [object for item in container]

and for a dictionary (mapping) we replace

for key in container:
mapping[key] = value

with

mapping = {key : value for key in container}

For some tasks, such as converting a list of lists to a tuple of tuples, the improvement in speed is quite striking:

$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(1000)]""" """
new_tuple = ()
for i in a_list:
    new_tuple += (tuple(i),)"""
1000 loops, best of 3: 1.82 msec per loop
$
$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(1000)]""" """
new_tuple = ()
tuple(tuple(b_list) for b_list in a_list)"""
1000 loops, best of 3: 224 usec per loop
$

(I included the unnecessary 'new_tuple = () in the timeit expression for the comprehension so that it would be more exactly comparable to the loop.)

A comprehension recalls the format of a mathematical set. Using one to create a list, rather than a loop with append, is one of the signs of an experienced Python programmer. But for concatenating lists (or other sequences) it can actually be faster to use a loop if the extend operation is involved. Here the task is to concatenate a list of lists into a single list:

$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(100)]
b_list = []""" """
for i in a_list:
    b_list.extend(i)"""
100000 loops, best of 3: 7.04 usec per loop
$
$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(100)]""" """
[subitem for item in a_list for subitem in item]"""
100000 loops, best of 3: 12 usec per loop
$

This is plainly a matter of continuing optimization by the developers of the language; compare those Python 3.3 timings with Python 2.7:

$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(100)]
b_list = []""" """
for i in a_list:
    b_list.extend(i)"""
100000 loops, best of 3: 11 usec per loop
$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(100)]""" """
[subitem for item in a_list for subitem in item]"""
100000 loops, best of 3: 13.5 usec per loop

[end]

All articles