Comprehensions are a common replacement for loops when we have to
populate a Python container. The language is organized so that large
lists, dictionaries, and tuples can be constructed more efficiently —
and so more quickly — with a comprehension than with a corresponding
loop over an append
or add
operation.
For a loop such as
the_list = []
for item in container:
the_list.append(object)
we write a comprehension as
the_list = [object for item in container]
Aside: More generally, a group of nested for statements in a nested loop undergoes only minimal change in order when replaced with a comprehension; we replace
the_list = []
for item in container: # outermost for statement
for subitem in item: # next outermost for statement
for subsubitem in subitem: # next outermost
# ...
the_list.append(object)
with
the_list = [object for item in container for subitem
in item for subsubitem in item ...]
The improvement in speed is considerable (Python 3.3):
$ python -m timeit -s """
from random import random as R
list = []""" """
for item in range(10000):
list.append(R())"""
1000 loops, best of 3: 1.08 msec per loop
$
$ python -m timeit -s """
from random import random as R""" """
[item for item in range(10000)]"""
1000 loops, best of 3: 334 usec per loop
$
(Note the primitive use of "u" for μ in "usec" (μsec). But the savings becomes smaller for deeply nested loops.)
Similarly, for a set:
the_set = set()
for item in container:
the_set.add(object)
we write
the_set = [object for item in container]
and for a dictionary (mapping) we replace
for key in container:
mapping[key] = value
with
mapping = {key : value for key in container}
For some tasks, such as converting a list of lists to a tuple of tuples, the improvement in speed is quite striking:
$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(1000)]""" """
new_tuple = ()
for i in a_list:
new_tuple += (tuple(i),)"""
1000 loops, best of 3: 1.82 msec per loop
$
$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(1000)]""" """
new_tuple = ()
tuple(tuple(b_list) for b_list in a_list)"""
1000 loops, best of 3: 224 usec per loop
$
(I included the unnecessary 'new_tuple = ()
in the timeit expression
for the comprehension so that it would be more exactly comparable to the
loop.)
A comprehension recalls the format of a mathematical set. Using one to
create a list, rather than a loop with append
, is one of the signs of
an experienced Python programmer. But for concatenating lists (or other
sequences) it can actually be faster to use a loop if the extend
operation is involved. Here the task is to concatenate a list of lists
into a single list:
$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(100)]
b_list = []""" """
for i in a_list:
b_list.extend(i)"""
100000 loops, best of 3: 7.04 usec per loop
$
$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(100)]""" """
[subitem for item in a_list for subitem in item]"""
100000 loops, best of 3: 12 usec per loop
$
This is plainly a matter of continuing optimization by the developers of the language; compare those Python 3.3 timings with Python 2.7:
$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(100)]
b_list = []""" """
for i in a_list:
b_list.extend(i)"""
100000 loops, best of 3: 11 usec per loop
$ python -m timeit -s """
from random import random as R
a_list = [[R(), R()] for i in range(100)]""" """
[subitem for item in a_list for subitem in item]"""
100000 loops, best of 3: 13.5 usec per loop
[end]