## Fast Fourier Transform in C++

To download:

1 | bzr branch http://bazaar.enseed.com/lib/Generic/ look under Math/Fourier.h |

**This class is a work in progress**. It works fine as it is, for most transforms I do are real->complex, and only half of the transform needs to be computed in such a case. So I plan on modifying this code eventually to deal with this case and save half of the computation time.

This started when I needed a Fast Fourier Transform that was not GPL’ed. If you can deal with a GPL license, by all means useFFTW instead. In my tests, FFTW is about 35% faster than the current code for vectors size between 1 and 65536. There is still room for improvement in the current code though, as I mentioned above, it’s a work in progress…

## Forward Transform

Performing a forward transform is done like this:

1 2 3 | std::complex<double> data[SIZE]; Fourier::forwardTransform(data, SIZE); // if you don't know SIZE at compile time Fourier::forwardTransform<double, SIZE>(data); // if you know SIZE at compile time |

Expect the second version (where you know SIZE at compile time) to be 2 to 10 times faster than the other (don’t foget to turn on compiler optimizations – they will make a huge difference, especially code inlining). The transform is performed in-place

## Reverse Transform

Performing a reverse transform is done like this:

1 2 3 | std::complex<double> data[SIZE]; Fourier::reverseTransform(data, SIZE); // if you don't know SIZE at compile time Fourier::reverseTransform<double, SIZE>(data); // if you know SIZE at compile time |

## Scaling

**This is important.** The discrete Fourier transform reads like this:

And the reverse discrete fourier transform:

I don’t compute the one-over-square-root-of-n term () because often, we don’t really care about actual values, but about the distribution. This means however, that the data will be scaled when you do a forward-reverse transform and you have to scale down it yourself:

1 2 3 4 5 6 7 8 9 10 11 12 13 | double factor = 1.0 /= sqrt(SIZE); std::complex<double> data[SIZE]; Fourier::forwardTransform<double, SIZE>(data); ***> for (int i = 0; i < SIZE; ++i) data[i] *= factor; Fourier::reverseTransform<double, SIZE>(data); ***> for (int i = 0; i < SIZE; ++i) data[i] *= factor; // Or, sometimes you can wait until the end to scale double factor2 = 1.0 /= SIZE; Fourier::forwardTransform<double, SIZE>(data); Fourier::reverseTransform<double, SIZE>(data); ***> for (int i = 0; i < SIZE; ++i) data[i] *= factor2; |

If you think my recursive templates are overkill and you’re a pure C fan, think again. Making a template-recursive LU decomposition or Fourier transform is not only fun, it’s fast. The Fourier transform here is based on a standard Cooley-Tukey algorithm and it comes in two flavours. You can either call:

1 2 3 | std::complex<double> *myData = new std::complex<double>[size]; // ... Fourier::forwardTransform(myData, size); |

or, if you know the size of your vector at compile time (which is often the case), for example, say size = 1024:

1 2 3 | std::complex<double> *myData = new std::complex<double>[1024]; // ... Fourier::forwardTransform<double, 1024>(myData); |

Cooley-Tukey is recursive – meaning it will find coefficients for the transform for a vector of size N by solving for a vector of size N/2. The naive implementation could use recursive functions, but get a lot of overhead if you don’t inline. If you inline, or if you build a single function which loops, you have less overhead, but you still end up with loops that will have very small iterations (e.g, for (i = 0; i < 4; ++i) …) and there is still a lot of overhead in the loop. Still, this is the most common implementation you’ll find of the Cooley-Tukey algorithm.

One possible way to gain speed is to unroll the loop, but it’s not easy since the repetition of the loop might be as small as 1 or 2.

My solution was to have the size of the loop as a template parameter. The compiler **knows** how many iterations there are, and it will unroll the loop for you. Additionally, you can unroll small transforms and remove the loop completely. It’s the same algorithm in the end, you just have far less overhead

If you’re not convinced with the theory, here are the timings to compute various Fourier transforms. One version uses the naive loop approach you’ll find all over the internet, the other uses recursive templates:

The speed gain decreases as the size of the vector grows (it’s normal since the overhead becomes less and less important). In my case, I work mostly with images and I’ll rarely use vectors larger than 2048. For that size, the recursive-template implementation is more than 4 times faster…

However, with Zhang Ziyi inside film and actual, the less

appropriate, its image begun to decline, starring Zhang Ziyi of “geisha”, since the film’s plot touches some sensitive Sino-Japanese sentiments among generated dissatisfaction users.

However may very well not need this particular

service at least to the first year. Soon, my buddy knocked

on my bedroom door, awakening me.