On the right side is an example overview of a state of the technology CPU. To optimize code for the CPU, is it important to know, how the CPU works in detail.
The CPU must deliver its data at a very high speed. The regular RAM cannot keep up with that speed. Therefore, a special RAM type called cache is used as a buffer - temporary storage. To get top performance from the CPU, the number of outgoing transactions must be minimized. The more data transmissions, which can be contained inside the CPU, the better the performance. Therefore, the AMD Phenom II was equipped with a built in L1, L2 and a L3 Cache. These Caches help minimize the data flow in and out of the CPU.
To write a speed optimized algorithm is it necessary to minimize the RAM access and the access between the cores.