Performance tests
This example tests the performance of the overlay_alpha*
functions with different sizes of random images.
Ctypes is used to dynamically load the alpha-lib
library in Python.
SIMD intrinsics
The following two graphs show the results of four experiments comparing the performance of overlaying one image onto another, using GCC's -O3
optimization level on the one hand, and using hand-crafted NEON intrinsics on the other hand. Especially for small images, the NEON version is much faster. For larger images, memory throughput and caching effects start to become more important factors than raw processing power, but the NEON version is still significantly faster than the version without intrinsics.
Rounding methods
The difference between the different scaling and rounding methods is negligible. As expected, an exact rounding division by 255 is slowest. An approximation is slightly faster, because it eliminates a vector load instruction to load the rounding constant. An exact flooring division by 255 is a tiny bit faster still.
The fastest option is to divide by 256 instead of 255, as both the rounding and flooring divisions by powers of two can be implemented using a single bit shift instruction.
This does result in a small error in the output image. Most notably, combining two white pixels with color values 0xFF
will result in a slightly less white pixel, with color value 0xFE
.
This graph also clearly shows the slightly better performance when the image size is a multiple of eight. The reason is the size of the NEON registers, which is four words, or eight 16-bit integers. When the number of columns of the foreground image is not a multiple of eight, extra code is needed to process the last pixels of each row, resulting in lower performance.
53 import os.path
as path
61 dir = path.dirname(path.realpath(__file__))
63 parser = argparse.ArgumentParser(description=
'Benchmark for overlay_alpha')
64 parser.add_argument(
'--no-simd', dest=
'SIMD', action=
'store_false',
65 help=
'Disable SIMD intrinsics')
66 parser.add_argument(
'--N', dest=
'N', type=int, default=25,
67 help=
'The number of different sizes to test')
68 parser.add_argument(
'--min', dest=
'min_size', type=int, default=10,
69 help=
'The size in pixels of the smallest image in the test')
70 parser.add_argument(
'--max', dest=
'max_size', type=int, default=2000,
71 help=
'The size in pixels of the largest image in the test')
72 parser.add_argument(
'--it', dest=
'max_iterations', type=int, default=10,
73 help=
'The number of test iterations for the largest images')
74 parser.add_argument(
'--rescale', dest=
'rescale', choices=[
'div255_round',
75 'div255_round_approx',
'div255_floor',
'div256_round',
76 'div256_floor'], default=
'div255_round',
77 help=
'The number of test iterations for the largest images')
78 args = parser.parse_args()
82 uint8_t_p = ctypes.POINTER(ctypes.c_uint8)
83 size_t = ctypes.c_size_t
87 so += platform.machine()
88 if not args.SIMD: so +=
"-no-simd"
90 dll = ctypes.cdll.LoadLibrary(path.join(dir, so))
91 overlay_alpha = dll[
'overlay_alpha_stride_' + args.rescale]
92 overlay_alpha.argtypes = [
101 overlay_alpha.restype = void
104 sizes = np.linspace(args.min_size, args.max_size, args.N, dtype=np.int)
105 times = np.zeros((args.N, ))
108 for i, size
in enumerate(sizes):
109 print(i + 1,
'/', args.N,
':', size)
112 bg_img = np.random.randint(255, size=(size, size, 4), dtype=np.uint8)
113 fg_img = np.random.randint(255, size=(size, size, 4), dtype=np.uint8)
114 out_img = np.zeros((size, size, 4), dtype=np.uint8)
115 bg_img_p = bg_img.ctypes.data_as(uint8_t_p)
116 fg_img_p = fg_img.ctypes.data_as(uint8_t_p)
117 out_img_p = out_img.ctypes.data_as(uint8_t_p)
120 iterations =
int(round(args.max_size * args.max_iterations / size))
121 start_time = time.perf_counter()
122 for _
in range(iterations):
123 overlay_alpha(bg_img_p, fg_img_p, out_img_p, size, size, size, size)
124 end_time = time.perf_counter()
125 times[i] = (end_time - start_time) / iterations
128 results = np.column_stack((sizes, times))
129 simd = (
' simd ' if args.SIMD
else ' no-simd ')
130 name = str(time.asctime()) + simd + args.rescale +
' ' + platform.machine()
131 np.savetxt(name +
'.csv', results, delimiter=
',')
134 import matplotlib.pyplot
as plt
135 plt.plot(sizes, times,
'.-')
136 plt.xlabel(
'Image size [pixels]')
137 plt.ylabel(
'Time [s]')
138 plt.savefig(name +
'.svg')