When working with Python’s multiprocessing module, the pool.imap
and pool.imap_unordered
functions are commonly used to apply a function to each item in an iterable in parallel. However, they have subtle differences in how they handle the order of the results. Let’s explore the distinctions between imap and imap_unordered
to understand when to use each one.
pool.imap
The imap
function in Python multiprocessing applies a function to each item in an iterable, returning the results in the same order as the input iterable. This means that the order of the results will match the order of the input iterable, regardless of the order in which the processes complete their tasks.
Use imap
when preserving the order of results is important and when you need to maintain the relationship between input and output items.
pool.imap_unordered
On the other hand, the imap_unordered
function also applies a function to each item in an iterable, but it does not guarantee that the order of the results will match the order of the input iterable. Instead, it returns results as soon as they become available, regardless of the order in which the processes complete their tasks.
Use imap_unordered
when the order of results is not important or when processing items independently, without any dependencies on the order of the input iterable.
Speed Considerations
Using pool.imap_unordered
instead of pool.imap
will not have a large effect. It might be a little faster, but not by too much.
What it may do, however, is make the interval between values being available in your iteration more even. That is, if you have operations that can take very different amounts of time (rather than the consistent 0.01 seconds you were using in your example), imap_unordered
can smooth things out by yielding faster-calculated values ahead of slower-calculated values. The regular imap will delay yielding the faster ones until after the slower ones ahead of them have been computed (but this does not delay the worker processes moving on to more calculations, just the time for you to see them).
Try making your work function sleep for i * 0.1
seconds, shuffling your input list and printing i
in your loops. You’ll be able to see the difference between the two imap versions. Here’s an example:
from multiprocessing import Pool
import time
import random
def work(i):
time.sleep(0.1 * i)
return i
def main():
p = Pool(4)
nums = [i for i in range(50)]
random.shuffle(nums)
start = time.time()
print('Using imap')
for i in p.imap(work, nums):
print(i)
print('Time elapsed: %s' % (time.time() - start))
start = time.time()
print('Using imap_unordered')
for i in p.imap_unordered(work, nums):
print(i)
print('Time elapsed: %s' % (time.time() - start))
if __name__ == "__main__":
main()