The compiler that fights you back
Every lesson on this site is compiled at -O4,p — the most aggressive setting MWCC GC/2.0 has. The 4 is the optimization level (full inlining, common-subexpression elimination, strength reduction, loop work). The ,p means optimize for the pipeline — schedule instructions for the Gekko's execution units rather than just for size. (The ,p is an MWCC-specific sub-flag; if you know GCC's -O levels, this comma syntax has no GCC equivalent.)
This changes the whole game. At -O0 the assembly mirrors your C line by line. At -O4,p the compiler is allowed to reorder, fuse, delete, and rematerialize instructions as long as the observable result is identical. Two things follow:
- The instruction order in the binary often does not match the source order. The optimizer hoists independent work to hide latency.
- To byte-match, you don't fight the optimizer — you feed it C whose optimized form equals the target. That is the entire craft of this chapter.
Here are two independent loads-and-adds, then a multiply of the results:
lwz r6, 0(r3) # all four loads hoisted to the top...
lwz r5, 4(r3)
lwz r4, 8(r3)
lwz r0, 12(r3)
add r3, r6, r5 # ...then the two adds...
add r0, r4, r0
mullw r3, r3, r0 # ...then the dependent multiply
blr
Notice the four lwz are batched at the front even though the source computed a fully before touching b. That batching is ,p scheduling at work — we dissect it next lesson.
Your task
Write combine(int *p) that takes four consecutive integers via a pointer. Read which array slots feed each add and how the two sums are combined; write the natural C and let -O4,p schedule it.