Capstone: Scheduling Meets fp_contract

Optimization & Scheduling

schedulingfp_contractcapstone

Everything at once

This is where the chapter comes together. Linear interpolation, the trick of landing a value some fraction of the way between two endpoints, runs through game code everywhere (SFA's lighting lerps are built on it). Write a pair of them, add the results, and three of the passes you've met all go off at once:

fp_contract collapses the multiply-then-add in each lerp into a single fmadds, once the fsubs for the difference is out of the way.
scheduling drags all four lfs loads up front and interleaves the two lerps so one's latency hides behind the other's work.
-O4 keeps the result branch-free and packs it into a tight set of registers.

To see the same shape at a different arity, here's a three-component take, mix2, where fp_contract and scheduling do their thing on three lerps:

lfs    f5, 0(r3)       # p[0]
lfs    f2, 0(r4)       # q[0]
lfs    f4, 4(r3)       # p[1]   — all six loads hoisted
lfs    f0, 4(r4)       # q[1]
fsubs  f2, f2, f5      # lerp0: q[0]-p[0]
lfs    f6, 8(r3)       # p[2]
fsubs  f0, f0, f4      # lerp1: q[1]-p[1]   interleaved
lfs    f3, 8(r4)       # q[2]
fmadds f2, f1, f2, f5  # lerp0 fused
fsubs  f3, f3, f6      # lerp2: q[2]-p[2]
fmadds f0, f1, f0, f4  # lerp1 fused
fmadds f1, f1, f3, f6  # lerp2 fused
fadds  f0, f2, f0
fadds  f1, f1, f0
blr

See how each lerp's fsubs and fmadds are braided into the next rather than sitting in their own block? That's the scheduler. Compile the same unit with #pragma scheduling off and every lerp would stand alone, finished before the following one starts.

For your blend, tally the lfs instructions in the target asm to learn how many arrays show up and how many elements each one reaches; every fsubs/fmadds pair marks one lerp.

Your task

Write blend(f32 *a, f32 *b, f32 t) to reproduce the assembly above. Write the lerps in the natural form and let the optimizer fuse and interleave.

Hints

match blendmwcceppc.exe -O4,p

Loading editor…

Hit “Compile & Check” to diff your code against the target.