Putting the instructions back in source order
The other half of the SFA pragma pair is scheduling off. It tells MWCC to emit instructions in (essentially) source order instead of reordering them to hide latency. When a target was built this way, the scheduled "loads-first" shape you saw earlier does not appear — the loads sit right next to the work that consumes them.
Same two-sums-times body as lesson 1, but with scheduling disabled:
lwz r4, 0(r3)
lwz r0, 4(r3)
add r5, r4, r0 # a computed immediately after its loads
lwz r4, 8(r3)
lwz r0, 12(r3)
add r0, r4, r0 # b computed immediately after its loads
mullw r3, r5, r0
The four loads are not batched; each pair sits with its add. That ordering — and the different register coloring it produces — is the fingerprint of scheduling off. As with peephole, you bracket the region and always pair off with reset. The two pragmas are frequently used together around a whole function in real decomp.
The #pragma scheduling off / reset lines are supplied in both the starter and the solution, and apply to the target, so concentrate on the body.
Your task
Write the body of combine3(int *p) to match the un-batched, source-order assembly above. Read which array slots pair with which add, and what the final mullw combines. With scheduling off the compiler emits in source order, so the layout of the loads tells you the layout of the C.