Nesting Over a 2-D Array

Loops

nested-loopsarraysrow-major

A grid in memory is one flat array

Real code rarely nests loops just to multiply counters — it nests them to walk a 2-D array. A rows × cols grid is almost always stored flat in row-major order: element (i, j) lives at flat offset i * cols + j. So the address arithmetic inside the inner body computes that offset, which the compiler builds with a mullw (the row part i * cols), an add (the column j), and then the familiar slwi+lwzx to scale by 4 and load.

Consider gnz(a, rows, cols), which counts how many cells of the grid are non-zero:

li   r8, 0          # c = 0
li   r6, 0          # i = 0
b    otest
obody:
li   r7, 0          # j = 0  (reset each outer pass)
b    itest
ibody:
mullw r0, r6, r5    # i * cols
add  r0, r7, r0     # + j   ->  flat index
slwi r0, r0, 2      # index * 4
lwzx r0, r3, r0     # load cell (i, j)
cmpwi r0, 0
beq- iskip          # zero -> don't count
addi r8, r8, 1      # c++
iskip:
addi r7, r7, 1      # j++
itest:
cmpw r7, r5         # j < cols ?
blt+ ibody
addi r6, r6, 1      # i++
otest:
cmpw r6, r4         # i < rows ?
blt+ obody
mr   r3, r8
blr

The two nested skeletons are exactly the ones from the previous lesson; what is new is the address computation in the inner body. Spot the mullw/add pair feeding the slwi/lwzx and you have found a flattened 2-D access. The cols operand of that mullw is the row stride — the second dimension of the array.

Your gsum uses the same nested skeleton and the same flattened addressing, but its inner body is simpler than this example — no compare, no branch. Read the target's inner body to see what it does with each element it loads.

#pragma optimization_level 1 keeps both loops rolled.

Your task

Write gsum, returning the sum of all rows * cols elements of the row-major grid a.

Hints

match gsummwcceppc.exe -O4,p

Loading editor…

Hit “Compile & Check” to diff your code against the target.