Just figured out what was wrong with my dot product calculation. I was multiplying two 16 bit integers together and they were rolling over. A simple cast to int32 and the problem was solved. Onward!

Update: 4:14 PM
Here’s and efficient implementation of the dot product that I wrote for the ARMv5E:

; void dot_product(int32 *product, int16 *vector1, int16 *vector2, uint32 N)
        STMDB   SP!, {R4-R11, LR}   ; Save registers to the stack
        LDMIA   R1!, {R5, R6, R7}   ; Load three vectors from vector 1
        LDMIA   R2!, {R8, R9, R10}  ; Load three vectors from vector 2
        SMULTT  R11, R5, R8         ; temp = I*I
        SMLABB  R11, R5, R8, R11    ; temp += Q*Q
        SMULTT  R12, R6, R9         ; Repeat for the other samples
        SMLABB  R12, R6, R9, R12
        SMULTT  R14, R7, R10
        SMLABB  R14, R7, R10, R14
        STMIA   R0!, {R11, R12, R14}; Save the dot products
        SUBS    R3, R3, #3          ; Subtract 3 from N
        BGT     next_sample         ; Branch if N > 0
        LDMIA   SP!, {R4-R11, PC}   ; Restore registers and return

This code is functionally identical to the following code with the exception that the number of vectors in the assembly code should be evenly divisible by 3.

void dot_product(int32 *product, int16 *vector1, int16 *vector2, uint32 N)
    int16 I1, Q1, I2, Q2;

        I1 = *vector1++;
        Q1 = *vector1++;
        I2 = *vector2++;
        Q2 = *vector2++;

        *product++ = I1*I2 + Q1*Q2;

The assembly version is about %25 faster. Use an oversized buffer if your number of samples is not evenly divisible by 3 or use the following code for arbitrary N:

; void dot_product(int32 *product, int16 *vector1, int16 *vector2, uint32 N)
        STMDB   SP!, {R4-R6}        ; Save registers to the stack
        LDR     R5, [R1], #4        ; Load a vector from vector 1
        LDR     R6, [R2], #4        ; Load a vector from vector 2
        SUBS    R3, R3, #1          ; Subtract 1 from N
        SMULTT  R12, R5, R6         ; temp = I*I
        SMLABB  R12, R5, R6, R12    ; temp += Q*Q
        STR     R7, [R0], #4        ; Save the dot product
        BGT     next_sample         ; Branch if N > 0
        LDMIA   SP!, {R4-R6}        ; Restore registers
        BX      R14                 ; Return

This version is about %15 faster.

Update: 5:59 PM
Well, my processing time is up to 2.8ms with IQ demod and dot product computation. It’s going to be tight!


Now that IQ demodulation is working, it’s on to CORDIC. The trick is to try and get this done in a reasonable amount of time. If it’s not possible then I’ll have to explore other options for computing atan2. I only need accuracy to 0.1 degrees so that may help a bit.

Update: 2:05 PM
It’s looking like using the dot product to get the angle between vectors might be a more fruitful path. It jumps straight to the angle between vectors, which is what we want, and looks like it might be quicker to calculate. The only issue is that I really need the two vectors to be unit vectors because I don’t want to have to compute the magnitude and do division. Hmm, simulation time…

Update: 6:00 PM
Good news is that computing the magnitude and doing the division will be much faster than I thought. Bad news is that it’s not computing the angle properly. The angle should be:

$$Large theta=cos^{-1}(mathbf{frac{a cdot b}{|a| |b|}})$$

but the answer is different when I compute atan2 of the two vectors and subtract them. I’m sure I’m doing something silly but it’s one of those things you have to put down for a little while so off to home I go.


Today I’m going to try and figure out why my FIR Filter doesn’t seem to work properly. I’m doing IQ Demodulation for a radio location system we’re developing but the output of my new FIR Filter looks really weird. Hopefully it’s just the taps for the filter that are wrong. I’ll update this post as I go along and I’ll eventually post the FIR code for the ARM that’s based on some code in the excellent ARM System Developer’s Guide but modified to take IQ interleaved data.

Update: 2:02 PM
Note to self: An array of int16’s looks weird when you treat them like int32’s. Ok, so the FIR filters work fine; it was my data display that was screwy.

Update: 5:03 PM
IQ Demodulation is working great. My goal is to do all our mathematics within 5 milliseconds so that our acquisition time is greater than our processing time. That way we can process continually so we won’t have to deal with synchronizing bursts of packets. IQ demod is taking 1.8ms so I have 3.2ms to do everything else. Should be fun! 🙂

Making My Own Space

So I’ve been trying to look for space for employees at Harbor Branch to blog at and have been frustrated.  I wanted a work related site that people outside the institution can read to keep up to date with the latest engineering and research here but so far I’ve been out of luck.  Since merging with FAU, our site is no longer hosted locally and there isn’t really an infrastructure in place to let us edit pages directly.  FAU does provide personal webspace but it requires you to be on the main campus to access and is not available at HBOI.  I’ve had so many cool things that I wanted to share but haven’t been able too.

Now of course I’ve had this site laying fallow for quite awhile.  I’ve occasionally posted job related stuff here in the past but tended to want to keep this site separate from work.  I’ve decided to change that.  I want to use this site to post some of my daily activities to keep coworkers, friends, and other interested parties informed about what we’re doing here at HBOI Engineering.

It won’t be all work related, naturally; this is my site and I’ll post what I please.  But since a large part of it will be work related, let me state for the record:

The views expressed on this website do not necessarily reflect the views of Florida Atlantic University, Harbor Branch Oceanographic Institute, or anyone other than the author.


New site!

Well I’m on a new server so it’s time for a brand new website!  I’m adding the old content back but it’s a slow process.  Stay tuned.