The multiply burst
Unlike division, a 64-bit multiply is done inline — but it expands into a distinctive cluster of instructions. Only the low 64 bits of the product matter, which works out to three partial products plus the high-half of one of them:
mulhwu r7, r4, r6 # high 32 bits of (a_lo * b_lo)
mullw r3, r3, r6 # a_hi * b_lo
mullw r0, r4, r5 # a_lo * b_hi
add r3, r7, r3
mullw r4, r4, r6 # a_lo * b_lo -> result low word
add r3, r3, r0 # result high word
blr
The key player is mulhwu ("multiply high word unsigned"), which gives the upper 32 bits of a 32×32 product — the bits that would otherwise be lost and need to be carried into the high word. Seeing mulhwu next to a few mullws, all feeding into the same register pair, is the signature of a 64-bit multiply.
Note the asymmetry with downcasting: a u32 = u64 * u64 downcast collapses to a single mullw, indistinguishable from an ordinary 32-bit multiply — so unlike addition, multiplication leaves no fingerprint once the result is truncated. The full burst above only appears when the result is genuinely 64-bit wide.
Your task
Write mul_64 to match the target.