What -O4,p Actually Does

Optimization & Scheduling

optimization-levelschedulingmental-model

The compiler that fights you back

Every lesson on this site is compiled at -O4,p — the most aggressive setting MWCC GC/2.0 has. The 4 is the optimization level (full inlining, common-subexpression elimination, strength reduction, loop work). The ,p means optimize for the pipeline — schedule instructions for the Gekko's execution units rather than just for size. (The ,p is an MWCC-specific sub-flag; if you know GCC's -O levels, this comma syntax has no GCC equivalent.)

This changes the whole game. At -O0 the assembly mirrors your C line by line. At -O4,p the compiler is allowed to reorder, fuse, delete, and rematerialize instructions as long as the observable result is identical. Two things follow:

The instruction order in the binary often does not match the source order. The optimizer hoists independent work to hide latency.
To byte-match, you don't fight the optimizer — you feed it C whose optimized form equals the target. That is the entire craft of this chapter.

Here are two independent loads-and-adds, then a multiply of the results:

lwz   r6, 0(r3)    # all four loads hoisted to the top...
lwz   r5, 4(r3)
lwz   r4, 8(r3)
lwz   r0, 12(r3)
add   r3, r6, r5   # ...then the two adds...
add   r0, r4, r0
mullw r3, r3, r0   # ...then the dependent multiply
blr

Notice the four lwz are batched at the front even though the source computed a fully before touching b. That batching is ,p scheduling at work — we dissect it next lesson.

Your task

Write combine(int *p) that takes four consecutive integers via a pointer. Read which array slots feed each add and how the two sums are combined; write the natural C and let -O4,p schedule it.

Hints

match combinemwcceppc.exe -O4,p

Loading editor…

Hit “Compile & Check” to diff your code against the target.