Blog Logo

08-Feb-2024 ~ 3 min read

Difference between pool.imap and pool.imap_unordered in Python Multiprocessing


When working with Python’s multiprocessing module, the pool.imap and pool.imap_unordered functions are commonly used to apply a function to each item in an iterable in parallel. However, they have subtle differences in how they handle the order of the results. Let’s explore the distinctions between imap and imap_unordered to understand when to use each one.

pool.imap

The imap function in Python multiprocessing applies a function to each item in an iterable, returning the results in the same order as the input iterable. This means that the order of the results will match the order of the input iterable, regardless of the order in which the processes complete their tasks.

Use imap when preserving the order of results is important and when you need to maintain the relationship between input and output items.

pool.imap_unordered

On the other hand, the imap_unordered function also applies a function to each item in an iterable, but it does not guarantee that the order of the results will match the order of the input iterable. Instead, it returns results as soon as they become available, regardless of the order in which the processes complete their tasks.

Use imap_unordered when the order of results is not important or when processing items independently, without any dependencies on the order of the input iterable.

Speed Considerations

Using pool.imap_unordered instead of pool.imap will not have a large effect. It might be a little faster, but not by too much.

What it may do, however, is make the interval between values being available in your iteration more even. That is, if you have operations that can take very different amounts of time (rather than the consistent 0.01 seconds you were using in your example), imap_unordered can smooth things out by yielding faster-calculated values ahead of slower-calculated values. The regular imap will delay yielding the faster ones until after the slower ones ahead of them have been computed (but this does not delay the worker processes moving on to more calculations, just the time for you to see them).

Try making your work function sleep for i * 0.1 seconds, shuffling your input list and printing i in your loops. You’ll be able to see the difference between the two imap versions. Here’s an example:

from multiprocessing import Pool
import time
import random


def work(i):
    time.sleep(0.1 * i)
    return i


def main():
    p = Pool(4)
    nums = [i for i in range(50)]
    random.shuffle(nums)

    start = time.time()
    print('Using imap')
    for i in p.imap(work, nums):
        print(i)
    print('Time elapsed: %s' % (time.time() - start))

    start = time.time()
    print('Using imap_unordered')
    for i in p.imap_unordered(work, nums):
        print(i)
    print('Time elapsed: %s' % (time.time() - start))


if __name__ == "__main__":
    main()