ARM NEON Compositor  master
Fast SIMD alpha overlay and blending for ARM
ARM NEON Compositor Documentation

Build Status GitHub

ARM/Raspberry Pi NEON Compositor

Uses NEON SIMD instructions to overlay a foreground image with an alpha channel (transparency) over a background image really quickly.

For small images, it is up to 3.5 times faster than an implementation without NEON intrinsics, and for really large images, it is around 1.4 times faster.

Documentation

Documentation

The modules page is the best place to start. The main function is overlay_alpha_stride.

You can find more in-depth explanations of the NEON intrinsics used by this library here: Raspberry-Pi/NEON.

Examples

The overlay_alpha example overlays a foreground image with an alpha channel onto a background image, for example:

Background Foreground Result

Performance

The following two graphs show the results of four experiments comparing the performance of overlaying one image onto another, using GCC's -O3 optimization level on the one hand, and using hand-crafted NEON intrinsics on the other hand. Especially for small images, the NEON version is much faster. For larger images, memory throughput and caching effects start to become more important factors than raw processing power, but the NEON version is still significantly faster than the version without intrinsics.

Small images Large images

The experiments were carried out on a Raspberry Pi 3B+ running Ubuntu 20.04 (64-bit).
More performance tests can be found in the perf_test example.