Sign lives in the load, width lives in the store
Push the read-modify-write up to halfword width and let the value be signed. Same three beats as before, load then modify then store, but the type drags two of the instructions along with it. A s16 is signed, so the load has to carry the sign upward, and that makes it lha, the algebraic load, rather than lhz. On the way out it is sth, the halfword store, truncating the way every store does.
Say quad(p) takes the signed halfword at p[0] and quadruples it. Since 4 is a power of two, the compiler trades the multiply for a left shift, slwi:
lha r0, 0(r3) # load p[0], sign-extended (s16 is signed)
slwi r0, r0, 2 # * 4 (shift left by 2)
sth r0, 0(r3) # truncate back to a halfword
blr
Worth burning into memory is where each responsibility sits. Signedness rides on the load, lha against lhz; the arithmetic plays out in a roomy 32-bit register; and width is the store's only concern, since sth simply keeps the bottom 16 bits and could not care less about the sign. Read the same location back later through a signed pointer and that later load sign-extends all over again. The store never did.
Your target reaches for a different operation than the worked example does. Find whatever sits between the lha and the sth, and read its operand straight off.
Your task
Write rmw_half, taking an s16*, to match the target assembly. The load must be lha and the store sth.