After the performance improvement from yesterday, I wanted to try some more things,
because the speed of this was still not satisfactory (I spent an hour processing 2000 images).
So I’ve whipped up line_profiler again:
This gave me the following trace:
Hmmm, 87% of the time is spent in unwrapIris(). Lets take a look at what that looks like:
So we can see that the load here is distributed between 3 lines, 2 trigonometric calculations
and one array access and some arithmetic. But the main problem is that these 3 lines are run for
each pixel, for 10 images processed in this sample this is 475 200 hits. If I have learned anything
about Python performance in the past year, it is to move all loops to an external library if possible.
In this case we’ll look at how could this be moved over to either Numpy or OpenCV. Since these
libraries are written in C or C++, they can perform the same loop much faster than plain Python.
Thanks to Numpy’s operator overloading, it is possible to write code that looks really weird at the
first glance, if you’re used to loops from other languages. Look at the lines 6 and 7. magnitude
is an array, angle is an array and iris_center is an int as it is used here. Numpy can
calculate this correctly and the loops are now in the library, yay!
So what changed? Line 4 makes two arrays, one with angles and one with magnitudes, both of dimensions
iris_radius×nsamples which is about 359x130 for most of my images. The angle array is
basically just a single row of radian values from 0 to 2pi repeated 130 times, and magnitude is
a column from 0 to 129 repeated horizontally 359 times.
The lines 6 and 7 convert these two arrays to X,Y coordinates in the image, from which we will be
sampling later. convertMaps() converts the them further to improve performance of cv2.remap().
And finally cv2.remap() maps these coordinate maps from the original image into the polar image.
And now lets take a look at what that did to the performance:
Nice! Now unwrapIris() is taking just 2.3% of the total time! And we have reduced time needed to process
10 images from ~34s to about 4s, that’s an order of magnitude improvement! Now the findIris() is the
slowest, maybe next time we’ll look at that.