File this under “things that should be obvious but I just found out about”. GCC will tell give you optimal flags for your processor. To wit:
echo "" | gcc -march=native -v -E - 2>&1 | grep cc1
Stick the results into your make file or command-line call to GCC and your executable should be as optimized for your processor as GCC can make it.
You could, of course, always use
--march=native and forget all that but that doesn’t work so well if you’re cross-compiling.