Fast Fourier Transform in C++
To download:
1 | bzr branch http://bazaar.enseed.com/lib/Generic/ look under Math/Fourier.h |
This class is a work in progress. It works fine as it is, for most transforms I do are real->complex, and only half of the transform needs to be computed in such a case. So I plan on modifying this code eventually to deal with this case and save half of the computation time.
This started when I needed a Fast Fourier Transform that was not GPL’ed. If you can deal with a GPL license, by all means useFFTW instead. In my tests, FFTW is about 35% faster than the current code for vectors size between 1 and 65536. There is still room for improvement in the current code though, as I mentioned above, it’s a work in progress…
Forward Transform
Performing a forward transform is done like this:
1 2 3 | std::complex<double> data[SIZE]; Fourier::forwardTransform(data, SIZE); // if you don't know SIZE at compile time Fourier::forwardTransform<double, SIZE>(data); // if you know SIZE at compile time |
Expect the second version (where you know SIZE at compile time) to be 2 to 10 times faster than the other (don’t foget to turn on compiler optimizations – they will make a huge difference, especially code inlining). The transform is performed in-place
Reverse Transform
Performing a reverse transform is done like this:
1 2 3 | std::complex<double> data[SIZE]; Fourier::reverseTransform(data, SIZE); // if you don't know SIZE at compile time Fourier::reverseTransform<double, SIZE>(data); // if you know SIZE at compile time |
Scaling
This is important. The discrete Fourier transform reads like this:

And the reverse discrete fourier transform:

I don’t compute the one-over-square-root-of-n term (
) because often, we don’t really care about actual values, but about the distribution. This means however, that the data will be scaled when you do a forward-reverse transform and you have to scale down it yourself:
1 2 3 4 5 6 7 8 9 10 11 12 13 | double factor = 1.0 /= sqrt(SIZE); std::complex<double> data[SIZE]; Fourier::forwardTransform<double, SIZE>(data); ***> for (int i = 0; i < SIZE; ++i) data[i] *= factor; Fourier::reverseTransform<double, SIZE>(data); ***> for (int i = 0; i < SIZE; ++i) data[i] *= factor; // Or, sometimes you can wait until the end to scale double factor2 = 1.0 /= SIZE; Fourier::forwardTransform<double, SIZE>(data); Fourier::reverseTransform<double, SIZE>(data); ***> for (int i = 0; i < SIZE; ++i) data[i] *= factor2; |
If you think my recursive templates are overkill and you’re a pure C fan, think again. Making a template-recursive LU decomposition or Fourier transform is not only fun, it’s fast. The Fourier transform here is based on a standard Cooley-Tukey algorithm and it comes in two flavours. You can either call:
1 2 3 | std::complex<double> *myData = new std::complex<double>[size]; // ... Fourier::forwardTransform(myData, size); |
or, if you know the size of your vector at compile time (which is often the case), for example, say size = 1024:
1 2 3 | std::complex<double> *myData = new std::complex<double>[1024]; // ... Fourier::forwardTransform<double, 1024>(myData); |
Cooley-Tukey is recursive – meaning it will find coefficients for the transform for a vector of size N by solving for a vector of size N/2. The naive implementation could use recursive functions, but get a lot of overhead if you don’t inline. If you inline, or if you build a single function which loops, you have less overhead, but you still end up with loops that will have very small iterations (e.g, for (i = 0; i < 4; ++i) …) and there is still a lot of overhead in the loop. Still, this is the most common implementation you’ll find of the Cooley-Tukey algorithm.
One possible way to gain speed is to unroll the loop, but it’s not easy since the repetition of the loop might be as small as 1 or 2.
My solution was to have the size of the loop as a template parameter. The compiler knows how many iterations there are, and it will unroll the loop for you. Additionally, you can unroll small transforms and remove the loop completely. It’s the same algorithm in the end, you just have far less overhead
If you’re not convinced with the theory, here are the timings to compute various Fourier transforms. One version uses the naive loop approach you’ll find all over the internet, the other uses recursive templates:

The speed gain decreases as the size of the vector grows (it’s normal since the overhead becomes less and less important). In my case, I work mostly with images and I’ll rarely use vectors larger than 2048. For that size, the recursive-template implementation is more than 4 times faster…
