The compiler that fights you back
Everything on this site is built at -O4,p, which is about as aggressive as MWCC GC/2.0 gets. The 4 is the optimization level. That alone buys you full inlining, common-subexpression elimination, strength reduction, and a pile of loop work. The ,p is the part that bites here. It tells the compiler to optimize for the pipeline, scheduling instructions around the Gekko's execution units instead of just packing them small. Coming from GCC? The ,p is an MWCC quirk, and there is no comma-suffix like it on GCC's -O flags.
So the rules change completely. Under -O0 the assembly tracks your C line for line. Under -O4,p the compiler reads your code as intent rather than layout, free to reorder, fuse, delete, and rematerialize instructions however it likes, as long as the result you can observe stays identical. Two consequences fall out of that.
- The instruction order in the binary often won't match your source order at all. Independent work gets hoisted upward to hide latency.
- Byte-matching isn't about wrestling the optimizer down. You feed it C whose optimized shape already equals the target, and that knack is what this whole chapter drills.
Below, two independent load-and-add pairs feed a single dependent multiply.
lwz r6, 0(r3) # all four loads hoisted to the top...
lwz r5, 4(r3)
lwz r4, 8(r3)
lwz r0, 12(r3)
add r3, r6, r5 # ...then the two adds...
add r0, r4, r0
mullw r3, r3, r0 # ...then the dependent multiply
blr
Look at how all four lwz got pulled to the front, even though the source finished computing a before it ever touched b. That reshuffle is ,p scheduling in action, and the next lesson takes it apart.
Your task
Write combine(int *p) that takes four consecutive integers via a pointer. Read which array slots feed each add and how the two sums are combined; write the natural C and let -O4,p schedule it.