../csci2510-program-execution

CSCI2510 Program Execution

Table of contents

Environment

After this part, you will have the following things installed:

Install from provided binary (x86_64 Linux)

wget --no-check-certificate \
    https://www.cse.cuhk.edu.hk/~jlhu/csci2510/riscv-linux-amd64.tar.zst
sudo tar axvf riscv-linux-amd64.tar.zst -C /opt
sudo chown -R $(whoami) /opt/riscv

Install from provided binary (macOS with Apple Silicon)

brew install gun-tar zstd

wget --no-check-certificate \
    https://www.cse.cuhk.edu.hk/~jlhu/csci2510/riscv-darwin-arm64.tar.zst
sudo gtar axvf riscv-darwin-arm64.tar.zst -C /opt
sudo chown -R $(whoami) /opt/riscv

Install from source code (both macOS and Linux with any CPU):

## install dependencies
# macOS
brew install dtc gnu-tar gnu-sed gawk flock texinfo gmp isl libmpc mpfr zstd boost
# ubuntu
sudo apt install autoconf automake autotools-dev curl python3 python3-pip \
    libmpc-dev libmpfr-dev libgmp-dev gawk build-essential bison flex \
    texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev \
    ninja-build git cmake libglib2.0-dev libboost-all-dev \
    device-tree-compiler

## prepare directory and permission
sudo mkdir -p /opt/riscv
sudo chown -R $(whoami) /opt/riscv

## download the source code
# option 1: download and decompress the provided tarball
gtar axvf riscv-gnu-toolchain.tar.zst
# option 2: GitHub (might take a long while depending on network conditions)
git clone --recurse-submodules https://github.com/riscv-collab/riscv-gnu-toolchain

## install musl gcc toolchain
cd riscv-gnu-toolchain
./configure --prefix=/opt/riscv
# this step will take about 30min on a 10-core M1 Pro
make musl -j$(nrpoc)
# check if installed successfully
/opt/riscv/bin/riscv64-unknown-linux-musl-gcc -v -E -x c

## spike
cd spike
mkdir -p build
cd build
../configure --prefix=/opt/riscv
make install -j$(nproc)
# check if installed successfully
file /opt/riscv/bin/spike

## pk
cd ../../pk
mkdir -p build
cd build
env PATH="$PATH:/opt/riscv/bin" RISCV=/opt/riscv ../configure --prefix=/opt/riscv --host=riscv64-unknown-linux-musl --with-arch=rv64gc_zifencei
env PATH="$PATH:/opt/riscv/bin" RISCV=/opt/riscv make install -j$(nproc)
# check if installed successfully
file /opt/riscv/riscv64-unknown-linux-musl/bin/pk

This way you only get gcc with newlib libc instead of the preferred musl.

You might need to change riscv64-unknown-linux-musl-* command accordingly.

brew tap riscv-software-src/riscv
brew install riscv-tools riscv-isa-sim riscv-pk

As introduced during the lecture, transforming a picece of human readable code into an executable program takes 3 steps, i.e. compile, assemble and link. In this part, you will walk through this process by manually finish these steps and finally compile hello.c into an executable. You should read hello.c first, it is a simple program to print a hello message.

Before proceeding to the following tasks, please execute a make clean under the source code directory to ensure we are starting with a clean tree.

Compile

Under the given source code tree, please type make compile. And watch out what command is been executed as well as what file is been generated.

Command you saw:

/opt/riscv/bin/riscv64-unknown-linux-musl-gcc -S -O0 -o hello.s hello.c

File generated:

hello.s

Assemble

The file you just saw is an assembly source code, please open it and get a feel of compiler generated assembly.

Next, we will assemble this file. Please type make assemble and record what command is actually been executed as well as what new file is been created.

Command you saw:

/opt/riscv/bin/riscv64-unknown-linux-musl-gcc -c  -o hello.o hello.s

File generated:

hello.o

The file you just saw is an object file, which is an binary file. To show a summary about the content, please type the following command: make dump-<file>. You should replace <file> with the created file last step.

In hello.c we have encountered two functions, they are main and printf. Please find them in the symbol table and paste the related two lines here:

Symbol table lines containing main and printf:

0000000000000000 g     F .text  0000000000000044 main
0000000000000000         *UND*  0000000000000000 printf

You might find out that the row for main contains F and .text, they means main is a function in the program text section.

However, the row for printf will give you *UND* which means undefined. That's because printf is provided by the C standard library. And it's the linker's job to link the actuall code into the program which uses it.

Please type make link and record what command is actually been executed as well as what new file is been created.

Command you saw:

/opt/riscv/bin/riscv64-unknown-linux-musl-gcc -static -no-pie -o hello hello.o

File generated:

hello

Executable file

The file you just saw is the final artifact of the overall compiling process, which are ready to be loaded into memory and executed. It's also an binary readable by objdump, please try using make dump-<file> again to see what's inside.

You might find the results overwhelmingly long, you can use make dump-<file> | grep <keyword> to filter out the desired info. Please also find the related rows describing main and printf in the Symbol table.

Symbol table lines containing main and printf:

0000000000010490 g     F .text  000000000000002c printf
0000000000010236 g     F .text  000000000000003e main

You might find out that now both main and printf are both a function of the text section.

Loading and memory layout

Execute an executable is essentially loading that executable file into memory then jump to the entry point in the .text section.

As you can see in the figure 1, the memory layout is mainly composed of text, data, stack, heap and shared libraries. We will not discussing heap (which is used for dynamic memory management) and shared libraries (which are used for dynamic linking). You have already seen .text section in the executable file, it's loaded into the text region in memory. Data region contians several data sections in the executable, such as .data, .sdata, .rodata and etc.

Can you name one or more symbols that are both defined in hello.c and loaded into data region?

csci
year

As you have probably noticed, data region mainly contains global variables. For function-scope local variables, they are stored in the stack. Stack grows towards lower address, i.e. when reserving additional stack space, the stack pointer sp will be decremented. Can you name one or more symbols that are local variable and their offset to the stack pointer in main? You should use the format <variable> <offset> for each symbol, for multiple symbols, please write each one in a different row.

Symbols and their offset:

a 8
b 0

1

Figure extracted from here. Entry point of text section might be different.

Now we have walked though the entire process of turning C code into exectuable and running them. Please type make load-hello to see the results.

Output of the program:

Hello, CSCI257!

Oops! It seems the output is not exactally correct. Can you change exactally one line of code in the generated assembly file to make it output Hello, CSCI2510!? After change the assembly code, you can type make load-hello again to see the results. If the assembly file is deleted, you can type make assemble to generate it again.

The assembly code that needs to be changed:

add	a5,a4,a5

The modified code:

mul	a5,a4,a5

You also need to submit the modified assembly file. Remember to save it before running make clean.

Note for debugging

You can use make debug-hello to run the executable under spike's debug mode. You can skip everything before entering main by typing until pc 0 <value-of-main>. <value-of-main> can be obtained from the first column of the objdump's output, like here. In the aboved case, we can use until pc 0 0x10236

Function call

In this part, you will learn how exactally does the RISC-V architecture support function calls like f(x, y) by looking closely at the RISC-V calling convention.

You might have seen call occurred in the generated assembly code. However, standard RISC-V instruction set does not include it. RISC-V only defines two J&L instructions jal rd,imm and jalr rd,rs1,imm. These instrucitons are able to keep the return address by storing the address of the following instruction into rd. As you may have guessed, call is a pseudo instruction that is implemented using an auipc plus a jalr and the register ra will keep the return address after a call.

We also suggest you use pseudo instructions as listed here and here. The gcc generated assemblies prefer the clearer pseudo instructions as well, e.g. call and ld/sd etc..

Calling convention

When calling a function, there are several problems to consider:

To solve these problems, calling convention is standarized. The first part of a calling convention is a set of rules on how to use and save registers. A complete documentation can be found on RISC-V official website here.

Registers

Stack frame and frame pointer

Eevery function needs some space on the stack to store local variables or save registers. To clearly mark the boundary among different functions' stack content, we have a frame pointer. The following figure shows the stack frame of main and f for the given program call.c. Frame pointer s0 and stack pointer sp are corresponding to the highest and lowest address of the stack memory (frame) owned by a function, no matter which function we are in, e.g. main or f.

The alignment is needed due to RISC-V standard requires sp to be aligned to 16B. f's arguments are copied to stack due to C's pass-by-value semantic. a, b and c are now f's local variables.

Prologue

Prologue are what need to be performed on entering a function:

Epilogue

Epilogue are what need to be performed just before returning from a function:

Can you identify which instruction(s) are part of the prologue and epilogue?

Prologue of function main:

	addi	sp,sp,-32
	sd	ra,24(sp)
	sd	s0,16(sp)
	addi	s0,sp,32

Epilogue of function main:

	mv	a0,a5
	ld	ra,24(sp)
	ld	s0,16(sp)
	addi	sp,sp,32
	jr	ra

Prologue of function f:

	addi	sp,sp,-64
	sd	ra,56(sp)
	sd	s0,48(sp)
	addi	s0,sp,64

Epilogue of function f:

	mv	a0,a5
	ld	ra,56(sp)
	ld	s0,48(sp)
	addi	sp,sp,64
	jr	ra

Can you draw out the stack frame of g using the figure above as an example? (Please draw the answer using asciiflow and export the results in ASCII Basic.)

old sp +----------+ s0
       | saved s0 |
   +24 +----------+ -8
       |          |
   +16 +----------+ -16
       |  a (a0)  |
    +8 +----------+ -24
       |  b (a1)  |
    sp +----------+ -32

Have you noticed any inconsistency compared to main or f? Can you explain why this happens?

The prologue and epilogue omit the saving and resotring of ra. The reason is that no other function is called by g, so the value stored in ra will not be contaminated by other functions, and we can omit the saving and restoring.

Parameters and return value passing

Can you show the instructions in f that are responsible for passing parameters and storing the returned return value to stack?

	li	a1,2
	ld	a0,-40(s0)
	call	g
	sd	a0,-24(s0)
	li	a1,5
	ld	a0,-48(s0)
	call	g
	sd	a0,-32(s0)

Can you show the instructions in f that are responsible for passing the final return value back to main?

	mv	a0,a5

Recursion

Recursion happens when a function calls itself. The existence of recursive function makes the need of stack to store temporary states more self-evident. In this part, you need to implement a recursive fast power algorithm using assembly.

The C code for your assembly implementation looks like this:

// exponent >= 0
long powr(long base, long exponent) {
    if (exponent == 0)
        return 1;

    long half = powr(base, exponent / 2);
    bool odd = exponent % 2;
    if (odd) 
        return base * half * half;
    else
        return half * half;
}

Please finish the given template code.

recursion.s
    .section  .rodata
    .align    3 # 2^3=8
error_msg:
    .string   "pow(%ld, %ld) got %ld expected %ld\n"

    .text
    .align 1
    .globl pow
    .type  pow, @function
pow:
    # long pow(long base, long exponent)
    # prologue
    # YOUR CODE HERE
    bne   a1,zero,pow_nonzero_exp
pow_zero_exp:
    # YOUR CODE HERE
    j     pow_return
pow_nonzero_exp:
    # YOUR CODE HERE
    # calculate pow(base, exponent / 2)
    # YOUR CODE HERE
    beq   a1,zero,pow_even
pow_odd:
    # YOUR CODE HERE
    j     pow_return
pow_even:
    # YOUR CODE HERE
pow_return:
    # epilogue
    # YOUR CODE HERE
    .size pow, .-pow


    .align 1
    .globl check
    .type  check, @function
check:
    # void check(long base, long exponent, long expected);
    addi  sp,sp,-64
    sd    ra,56(sp)
    sd    s0,48(sp)
    addi  s0,sp,64
    sd    a0,-40(s0) # base
    sd    a1,-48(s0) # exponent
    sd    a2,-56(s0) # expected
    call  pow
    sd    a0,-24(s0)
    ld    a5,-56(s0)
    beq   a0,a5,check_ok
check_fail:
    lui   a5,%hi(stderr)
    ld    a0,%lo(stderr)(a5)
    ld    a5,-56(s0) # expected
    ld    a4,-24(s0) # pow result
    ld    a3,-48(s0) # exponent
    ld    a2,-40(s0) # base
    lui   a1,%hi(error_msg)
    addi  a1,a1,%lo(error_msg)
    call  fprintf
    li    a0,-1
    call  exit # will not return
check_ok:
    ld    ra,56(sp)
    ld    s0,48(sp)
    addi  sp,sp,64
    jr    ra
    .size check, .-check


    .align 1
    .globl main
    .type  main, @function
main:
    addi  sp,sp,-16
    sd    ra,8(sp)
    sd    s0,0(sp)
    addi  s0,sp,16

    # check whether pow(2, 5) gives 32
    li    a2,32
    li    a1,5
    li    a0,2
    call  check
    # more test cases ...
    li    a2,27
    li    a1,3
    li    a0,3
    call  check

    li    a2,387420489
    li    a1,18
    li    a0,3
    call  check

    li    a5,0
    mv    a0,a5
    ld    ra,8(sp)
    ld    s0,0(sp)
    addi  sp,sp,16
    jr    ra
    .size    main, .-main

You can use make load-recursion to test if your program runs correctly. If your program runs correctly, you should see something like this:

pow(2, 5) got 32 expected 32
pow(3, 3) got 27 expected 27
pow(3, 18) got 387420489 expected 38742048
make: *** [load-recursion] Error 255

Otherwise, you should see an error:

pow(2, 5) got 1 expected 32
make: *** [load-recursion] Error 255