It slipped my mind that using the dot product yields the absolute value of the angle difference and we care very much which vector leads of lags the other. No matter, cross product to the rescue! Because the cross product is anti-commutative, the sign of the resultant vector will tell us if the angle difference is positive of negative. We also know that the angle difference should remain constant over a reasonably short period of time so we’ll only need to compute the cross product for a single vector to determine the sign.

Update: 3:05 PM
Currently my buffer sizes and number of taps are divisible by 6. That makes optimal use of the registers in the ARM but it’s causing me a bit of a headache because 6 is not a 2n number, so I’m going to make everything divisible by 8 to ease some of the math involved.