Preparation
- Simulator: Spike
- Debugger: Spike/GDB
Assignment: Program Execution
Objective: understand how machine instruction is executed, how data is stored, and how the processor stack helps to make that happen.
Task: write and compare iterative and recursive program, learn about the calling convention, local variable storage and how does PC affect instruction flow.
- Get familiar with calling convention (convert a simple example function)
- Before calling/After returning from a function
- Parameter passing
- Get return value
- After entering/Before returning from a function
- Save/Restore register states
- Before calling/After returning from a function
- Convert the exponentiation implementation
- Pass given test cases
- Calculations
- Stack frame size and count?
- Figure out stack context with given PC and parameters
Expected learning results:
- The purpose of the stack
- We cannot do function call if there is no place to save the current state (PC)
- The purpose of calling convention
- It's a standard way and also easier to save state (e.g. PC) and pass variables
- Stack is also helpful for reducing programming complexity
int f(int x) { return (x >> 16) | (x << 16); }
int pow_recursive(int base, int exponent) {
if (exponent == 0) {
return 1;
} else if (exponent % 2 == 0) {
int half_power = pow_recursive(base, exponent / 2);
return half_power * half_power;
} else {
int half_power = pow_recursive(base, (exponent - 1) / 2);
return base * half_power * half_power;
}
}
int pow_iterative(int base, int exponent) {
int result = 1;
while (exponent > 0) {
if (exponent % 2 == 1) {
result *= base;
}
base *= base;
exponent /= 2;
}
return result;
}
Assignment: Cache
Objective: understand locality and its relation to program performance
Task: Write a matrix multiplication program in assembly.
- Get familiar with array and memory layout of multi-dimentional array
- Know the difference between passing by value and passing by pointer
- Write a dumb nested loop
- Record performance with spike
- Swap the inner loop
- Record performance with spike
- Ask students to do further optimization given the cache settings
- (e.g. blocked multiplication)
- Results are judged by improvements
Expected learning results:
- Main memory hierarchy
- Performance and locality
L2$ Bytes Read: 1870835392
L2$ Bytes Written: 787498176
L2$ Read Accesses: 29231803
L2$ Write Accesses: 12304659
L2$ Read Misses: 11543088
L2$ Write Misses: 108558
L2$ Writebacks: 4707069
L2$ Miss Rate: 28.052%
#define ROWS 1024
#define COLS 1024
void matrix_multiply(int A[][COLS], int B[][COLS], int result[][COLS]) {
for (int i = 0; i < ROWS; i++)
for (int j = 0; j < ROWS; j++) {
result[i][j] = 0;
for (int k = 0; k < ROWS; k++)
result[i][j] += A[i][k] * B[k][j];
}
}