Why a float function stores 64 + 32 bits per register
The Gekko's floating-point registers f14–f31 are callee-saved: a function that keeps live float values across a call must preserve them and restore them before returning. What's surprising is how MWCC saves each one — not one store, but two:
stwu r1, -96(r1)
mflr r0
stw r0, 100(r1)
stfd f31, 80(r1) # save the 64-bit double view of f31...
psq_st f31, 88(r1), 0, 0 # ...AND the paired-single (two 32-bit) view
stfd f30, 64(r1)
psq_st f30, 72(r1), 0, 0
...
Each callee-saved FPR gets an stfd (store float double) paired with a psq_st (store paired-single quantized). The Gekko can hold two packed 32-bit floats in one FPR, and stfd alone would only preserve the double-precision lane — so MWCC emits the psq_st to guarantee both paired-single halves survive too. The epilogue mirrors it exactly: psq_l then lfd for each register, highest-numbered first.
When you see a prologue full of stfd/psq_st pairs climbing f31, f30, f29..., that's just callee-saved float registers — the function is float-heavy enough to spill them. You match it by writing C with enough live f32 values across calls; the saves fall out automatically.
Your task
Write mix(f32 *p): call the provided transform on p[0]..p[5] into six locals — skip the name f so it isn't confused with the f32 type — then combine those locals into a return value. Holding six float results live across six calls forces several callee-saved FPRs — watch the psq_st/stfd pairs appear in the prologue. Use the fmadds/fmuls/fadds instructions in the epilogue to reconstruct which products are added together.