When the optimizer is forbidden to remember
At -O4,p MWCC is happy to fold repeated memory reads through CSE, common-subexpression elimination. Load the same global twice with nothing writing to it in between and the compiler loads it once, then reuses the value. Here is a plain global g_frame added to itself.
lwz r0, g_frame@sda21(r2) # ONE load...
add r3, r0, r0 # ...reused for both operands
blr
Now make g_frame a volatile and everything you just saw goes backwards. Because the standard treats each access to a volatile as observable behaviour, the compiler isn't free to merge two reads into one, to leave the value parked in a register between uses, or to slide the accesses around in the schedule, so reading the variable twice in the source genuinely costs you two loads in the output.
lwz r3, g_frame@sda21(r2) # first read
lwz r0, g_frame@sda21(r2) # second read — not CSE'd
add r3, r3, r0
blr
Why does that matter for matching? Because a value that comes back from memory more often than the arithmetic could possibly need, with nothing writing to it in between, is about the most dependable sign of volatile you will ever get. When your attempt collapses down to one load and the target stubbornly reloads, the original was almost certainly volatile, and the same reasoning runs in the other direction just as well. There is more to it than the load count, though, because volatile also pins the instruction order that ,p scheduling would otherwise be free to rearrange, and that ordering guarantee is the whole reason hardware-register code depends on it.
Your task
Write twice_vol using the provided volatile int g_counter to reproduce the assembly above.