›INDEX

Last Updated: Loading...

CS 2253

Binary Representation

The way we choose to represent states will depend on the operations we want to perform and the type of data.

Adding Unsigned Integers in Binary

4 possible operations that may have to be performed:

4 possible operations

One's and Two's Complement

For negative numbers, since machines don't have a special character internally to store negative numbers, we have to find a way to represent it.

Option 1: Just use MSB to represent a negative value (sign-magnitude). ( just using the first bit to represent either negative or positive. ) - We have a +0 and a -0 (two states for the same value)

Option 2: Using 1's complement (where we just flip all the bits) - 2 + (-3) and 4 + (-3) become hard operations to perform.

Option 3: 2's complement (where we flip all the bits and add 1) - Representation where -X + X = 0

In 2's complement,

Positive values - Normal integer representation
Negative values - Flip all the bits of the positive value and add one.

To convert from -ve (you know by looking at the first bit of the number, if that's a one then it's a negative number.), you just flip all the bits and add 1. This will give you the binary representation of the positive number with the same value. Then you can convert it to decimal (if that's the final operation.)

Eg:

2's Complement to Decimal

2's signed int: 1011
-> 0100 (flip all the bits)
-> 0101 (add 1)
-> 5 in decimal.
=> 1011 is -5 (in decimal).

Decimal to 2's Complement

Write -5 in 2 signed in 4 bits.
-> 0101 is +5 in binary in 4 bits.
-> 1010 (flip all bits)
-> 1011 (add 1)
=> 1011 is -5 in binary in 4 bits.

IMPORTANT NOTE: The range for 2's signed complement is 2^x-1-1 to -2^x-1 where x is the number of bits we have available.

This is because 0000 would mean 0 but 1000 would mean -8 and not -0.

Eg:

60 and -60 in signed ints

Adding -60 and +60 to get 0

Adding -60 and 60

Adding Signed Integers

Adding signed integers

When adding, first convert the decimal to the positive version in binary. Next, if the number is negative (leads with a 1) then convert the binary to a negative (flip the bits and add 1). Then just add the 2 numbers. To read the number in the end we convert the binary to decimal just to check our answer.

For Subtraction just add the negative of the 2nd operand. x - y is just x + (-y).

Sign Extension

Basically, if it's a positive number (The first bit is a 0) then you can prefix the number with more zeros and it doesn't change the value. If it's a negative number (The first bit is a 1) then you can prefix the number with more 1s and it doesn't change the value.

So, in signed int:

010101 = 0000010101
100110 = 1111100110

This is how you extend the numbers if you want them to take up more space for either storing a larger number or adding it to a larger bit number.

Hexadecimal and Binary

This is really simple. Each character of hexadecimal maps to 4 bits in binary. So, f is the same as 1111 and 1010 is the same as a.

IEEE754 Floating Point

In IEEE754 Floating-point representation, we use 32-bits to hold a floating-point value. Here values are stored in scientific notation. This is when you have a number in the following format:
X.XXXXXXX * x^y (where x is the base we are working in)

Example: - In Decimal: 5.13416 * 10³ - In Binary: 1.101101 * 2^-5

The first bit of the 32 represents the sign of the number (so -ve or +ve). 1 would represent -ve and 0 would represent +ve.
The next 8 bits hold the exponent of the number as an unsigned value. This value is going to be 127 + actual exponent of number. This way we don't have to use a bit to represent the sign of the exponent.
The next 23 bits hold the fractional part of the number. Note, we don't have to hold the leading 1 in 1.101101 since we can just assume the 1 since this is scientific notation.

Here's how you would convert a number to this representation:

IEEE Floating-point representation

Here are a few examples of converting to and from this form of representation:

IEEE Floating point exercises

Note about IEEE Floating-point numbers: Since we just assume the preceding 1 for any number since every number other than 0 is in the form 1.xxx when written in scientific format, this causes a small issue for the number 0 since have the exponent be 127 (aka the value 0) and have all 0s in the number representation would imply the number 2 and not the number 0. To solve this problem, the exponent 0(aka the value -127) is taken as a special exponent where the 1 (and the exponent value is taken to be -126 an it's denormalized) is not assumed and has to be explicitly mentioned. This is done so that 0000 0000 0000 0000 0000 0000 0000 0000 indicates the number 0 (That's 4 bytes).

Here's a IEE 754 calculator

The Von Neumann Model

The von Neumann model consists of five parts: Memory, a processing unit, input, output, and a control unit. The data the program needs to carry out the work of the program is either contained int he program's memory or is obtained from input devices. The order in which the instructions are carried out is handled by the control unit.

von newmann model

(Source: Introduction to computing systems - Yale Patt)

PC: Program counter, this just keeps track of which instruction has to be executed next.
IR: Instruction register, this contains the current instruction that being processed.
MAR: Memory address register, this is used to specify the address when either trying to read or write to memory.
MDR: Memory data register, this is used to either write data to memory or read data from memory.
Register: A register is simply a set of n flip-flops that collectively are used to store one n-bit value. The PC, IR, MAR and MDR are all registers that store 16 bits (in the LC3, not all machines.) of information each.

Memory

Memory is constructed of 2 parts, the address and the data. Each address usually contains one byte of data. It's just numbered from 0 to x and x is going to be some power of 2. Such as, if you have 16GB of memory then you have 2³⁴ locations of memory.

Reading data

To read the contents of a memory location, we first place the address of that location in the memory's address register (MAR) and ten interrogate the computer's memory. The data in that address is going to be stores in the memory's data register(MDR)

Writing Data

To write data to memory, we first write the address where we want to store to the MAR and then we write the data that we want to store to the MDR and then we interrogate the computer's memory with the write into enable signal asserted. The information contained in the MDR is going to be written to the location specified by the MAR.

Processing Unit

The processing unit can consist of many complex functional units, each performing one particular operation (divide, add, square root, etc). the simplest processing unit, is the ALU. ALU stands for Arithmetic and Logic Unit. It performs operations like ADD, SUBTRACT, AND, OR, NOT etc.

The ALU normally processes data elements of a fixed size referred to as the word length of the computer. The data elements are called words. Each ISA has its own word length, depending on the intended use of the computer.

Since reading from memory usually takes longer than performing something like an ADD operation, almost all provide some small amount of storage very close to the ALU to allow results to be temporarily stored if they will be needed to produce additional results in the near future. The most common form of temporary storage is a set of registers.

For example, if a computer is to calculate (A + B) * C, it could store the result of A+B in memory, and then subsequently read it in order to multiply that result by C. However, this would be slow due to more number of memory reads. So computers just store the result of A+B in a temporary storage that's really fast and is really close to the ALU.

Control Unit

This unit is in charge or all the other parts of the computer. The control unit keeps track of both where we are within a process of executing the program and where we are in the process of executing each instruction.

It keeps track of the instruction using the IR (instruction register) and it keeps track of the next instruction to be processed using the PC (program counter), which is the address in memory of the next instruction.

Instruction Processing

The Instruction

The most basic unit of computer processing is the instruction. It is made of two parts, the opcode (what the instruction does) and the operands (what it does it to).

There are fundamentally three kinds of instructions: - Operates: instructions operate on data. - Data movement: move information from processing unit to and from memory and to and from input/output devices. - Control: altering the sequential processing of instructions (such as loops or conditionals). - Some ISAs have some special instructions that are necessary for those ISAs.

The Instruction Cycle

The entire sequence of steps required to process an instruction is called the instruction cycle. The instruction cycle consists of six sequential phases, each phase requiring zero or more steps (zero because not all computers have been designed such that all instructions require six phases).

The six phases:

FETCH
DECODE
EVALUATE ADDRESS
FETCH OPERANDS
EXECUTE
STORE RESULT

Fetch

Obtains the next instruction in memory and loads it into the instruction register (IR).

MAR is loaded with contents of PC and simultaneously increments the PC.
The memory is interrogated, which results in the next instruction being placed by memory into the MDR.
The IR is loaded with the contents of the MDR.

Decode

The decode phase examines the instruction in order to figure out what the microarchitecture is being asked to do. This looks at the opcode and tries to understand what has to be done.

Evaluate Address

This phase computes the address of memory location that is needed to process the instruction. So anytime an address is specified in the instruction, this loads that data into a register. Some operations doesn't require any data to be read from memory (such as ADD and AND) and in those cases the evaluate address phase is not needed.

Fetch Operands

This phase obtains the source operands needed to process the instruction. For example, in the LD ( load instruction ) this phase would take two steps; loading MAR with the address calculated in the EVALUATE ADDRESS phase and reading memory that resulted in the source operand being placed in MDR.

In most current microprocessors, this phase (for the ADD instruction) can be done at the same time the instruction is being executed (the fifth phase of the instruction cycle). How this is done is not covered here and is something I look forward to looking into.

Execute

This phase carries out the execution of the instruction.

Store Result

The result is written to its designated destination. In the case of the ADD instruction, in many computers, this action is performed during the EXECUTE phase. That is, many computers, an ADD instruction can fetch its source operands, perform the ADD in the ALU, and store the result in the destination register all in a single clock cycle. A separate STORE RESULT phase is not needed.

Coding programs for the LC3

What is the LC3

The LC3 actually just a program written in Java that simulates an actual computer at the level of the processor. This is similar to that of Intel x86-64 but in the case of such real architectures there are many instructions (100s) and more complicated than something like the LC3 which just has 16 instructions. The LC3 makes learning binary and assembly language easier.

The LC3 ISA

LC3 ISA

These are the set of instructions that are used when trying to program the LC3 computer.

Each of these will be talked about later.

Here's a key in case the chart is too cryptic - DR: Destination Register. - SR: Source Register. - imm5: a signed int value of 5 bits for immediate adding. - PCoffset9: a signed int value of 9 bits for moving the Program counter. - BaseR: base register. - n,z,p (in BR): This check is for the last condition code to check if it's negative, zero, or positive. Things marked with + modify the condition code. - trapvect8: Unsure at this point.

First program

I think its easiest to just talk about how to program the LC3 by actually introducing a program and then talking about it.

The objective of the program: The program takes one number as an input which is specified after the halt instruction. If the number is positive or 0 then it doubles it and if the number is negative it converts it to positive. Then stores the result at the memory location x3011.

Here's a flow chart of the program:

Program flow chart

And here's the program code ( ; indicates start of comment )

0011 000 000000000
; Load program into x3000.
; The first line of any program is the address where it has to be loaded.
; So this is just the binary representation of the address x3000

0010 001 000000111
; Load value at memory location into R1 [ LD ]

0000 100 000000010
; If last register number is negative move PC to 2 steps ahead [ BR ]

0001 001 001 0 00 001
; Add R1 to itself and store in R1 [ ADD ]

0000 111 000000010
; Move to end of if else (2 steps ahead) [ BR ]

1001 001 001 111111
; Not value in R1 and store in R1 [ NOT ]

0001 001 001 1 00001
; Add 1 to R1 [ ADD (immediate) ]

0011 001 000001010
; Store value in x03011 [ ST ]

1111 0000 00100101
; Halt [TRAP]

0000 0000 0000 0100
; Input value

The first line of code specifies to the LC3 that the following instructions have to be loaded to x3000 in memory.
The second instruction loads the input value from the end of the program (after the halt statement) into a register for processing. This is done by calculating the PC offset that would be required to load the memory location. The PC at this point would point the next statement that has to be executed and based on that and the length of our program we can do some math to figure out how much we need to add to our PC to get to the input memory location. In our case, that is the 7, i.e. the input value is 7 memory locations higher than current PC.
If we look at the LD instruction in the chart that was provided above, we see that it has a + on the top right of it. This indicates that this instruction is going to update the condition codes. So, if the number we load was 0 it would set the z condition code to 1 and the others to 0. Similarly for n (for negative) and p (for positive). Based on this we need to either double the number or have it positive. Using the BR instruction we can direct the computer to specific instructions based on those condition codes. This is what implements our if statement. Here's a diagram of how the memory is going to look.
The rest is rather self-explainatory in the diagram below.

Program in memory

OR in LC3

Since we don't have a specific instruction for OR in the LC3 machine we use De Morgan's laws ( wiki ) to calculate OR from AND and NOT operators.

A ∨ B = ¬(¬(A) ∧ ¬(B))

Instructions

LD input A into R1
LD input B into R2

NOT R1 into R1
NOT R2 into R2
AND R1 and R2 and store in R1
NOT R1 and store in R1

ST R1 to output

Conditionals in LC3

Since instructions are just executed one after another and there isn't real branching that can be done with just the processor what we do is just have the true suite be in one place and the false be lower or higher and based on the BR we can jump to the location of the branch we wish to execute.

If condition in machine code

Loops in LC3

We do something very similar to if statements but in this case we just move the PC to the top of our loop in particular conditions are met.

Do while loop: do while loop in machine code

While loop (This is a regular while loop like the one in C or Java): Regular while loop in machine code

LC-3 ISA Detailed

Overview

Memory Organization

The LC-3 memory has an address space of 2!!^16!! (i.e. 65,536) locations, and an addressability of 16 bits. Since the normal unit of data that is processed in the LC-3 is 16 bits, we refer to 16 bits as one word, and we say the LC-3 is word-addressable (as in you can reference all memory locations using just one word).

Registers

Since it usually takes far more than one clock cycle to obtain data from memory, the LC-3 provides (like almost all computers) additional temporary storage locations that can be accessed in a single clock cycle. The most common type of temporary storage is the register. Each register can store 16 bits (in the LC-3).

Registers must be uniquely identifiable. The LC-3 specifies 8 GPRs (General purpose registers), each identified by a three-bit register number.

Opcodes

Opcodes are the way to tell a machine what instruction has to be executed. The Intel x86 ISA has more than 200 opcodes. Other ISAs have fewer opcodes, it depends on the purpose of the machine. The LC-3 ISA has 15 opcodes and is specified using 4 bits (We could have had 16 opcodes but the code 1101 has been left unspecified, reserved for some future need that we are not able to anticipate today).

There are three different types of opcodes: operates, data movement, and control. Operate opcodes operate on some data, data movement opcodes load or store data in some form, and control opcodes change the sequence of instructions that will be executed.

Addressing modes

An addressing mode is a mechanism for specifying where the operand is located, it can be in one of three places (generally): memory, register, or part of instruction.
If the operand is a part of the instruction, we refer to it as literal or immediate operand.

The LC-3 supports 5 addressing modes: immediate, register, and three memory addressing modes: PC-relative, indirect, and Base+offset.

Immediate: Instruction contain literal operand.
Register: A register contains the operand.
PC-relative: The operand is stored at memory location PC + some specified offset (usual 9 bit signed int).
Indirect: You specify a memory location (Using PC + some offset) at which there will be the address of the operand's memory location.
Base + offset: You specify a register and some offset to that value of the register to be used as a memory location.

Condition codes

The LC-3 has three single-bit registers that are individually set or cleared each time one of the eight general purpose registers is written into as a result of execution of one of the operate instruction or one of the load instructions.

The three single-bit registers are called N, Z, and P, corresponding to their meaning: negative, zero, and positive. Each time a GPR is written to, one of these 3 are set to 1, corresponding to if the value being written is negative, zero, or positive respectively.

Operate Instructions

NOTES: - Registers are specified by a 3 bit number for the LC-3 (0-7). - DR stands for destination register. - SR stands for source register.

ADD

There are 2 types of ADD instructions that you could specify. - Register mode (Add values of 2 registers) - Immediate mode (Add specified value with another register's value)

Here's what the skeleton of the register mode looks like:

Here, the red bit is specifying that the mode is register mode. This instruction is going to take the values in the SR1 and SR2 registers and add them and store them into DR which is the destination register.

Here's what the skeleton of the immediate mode looks like:

Immediate mode ADD

Here, the red bit is specifying that the mode in immediate mode. This instruction is going to take the value in SR1 and add the value specified by the green bits. You have 5 bits so your value is going to be in the range -16 to 15. It will store the result into DR which is the destination register.

NOT

Not operation

The value in the source register is going to be inverted (all 1s to 0s and vise versa) and the new value is going to be stored in the destination register.

AND

This specifies a bit-wise and operations between 2 particular values.

There are 2 types of AND instructions that you could specify. - Register mode (AND values of 2 registers) - Immediate mode (AND specified value with another register's value)

Here's the skeleton of the Register mode AND operation

Here, the red bit is specifying that the mode is register mode. This instruction is going to take the values in the SR1 and SR2 registers and perform bit-wise AND on them and store them into DR which is the destination register.

Here's what the skeleton of the immediate mode looks like:

Immediate mode AND

Here, the red bit is specifying that the mode in immediate mode. This instruction is going to take the value in SR1 and perform bit-wise AND on the value specified by the green bits. You have 5 bits so your value is going to be in the range -16 to 15. It will store the result into DR which is the destination register.

LD

Load instruction

This loads (reads a value from a memory location) the memory location at the address specified by the PC (program counter) + the offset specified in the instruction. The offset is a 2's complement signed integer.

Since we have 9-bits, we can access memory locations from PC - 256 to PC + 255.

Example:

x3001: 0010 001 000000110

The above instruction is going to load whatever is at memory location x3008 into register 1. We got x3008 since we know that the PC is going to contain the address of the next instruction (which would be x3002) and we've specified an offset of 6 (decimal).

LDI

Load Indirect Instruction

Load indirect

It's calculates the address the same way as LD does and the uses the value at the address as an address which then contains the value we want to load.

PC + offset -> Address -> Address -> value

Example:

x3001: 1010 001 000000001
x3002: ...
x3003: 0011 0000 0000 0101 ; x3005 in binary
x3004: ...
x3005: 0000 0000 0000 0001 ; 1 in binary

So, first the LDI instruction is going to calculate that the address is PC + offset, which in our case is x3002 + 1 which results in x3003. At that location, is the value x3005 and LDI is going to treat that as an address and go to x3005 and load the value located there (1) into register 1.
So in the end, register 1 would contain the value 1

LDR

Load from register instruction

Load from register

LDR does what LD does but rather than using the PC as the register, it uses the specified register. So, it's going to add the offset specified to the value in the register which is specified and load the value located at that memory location into the destination register.

LEA

Load Effective Address

Load effective address

Although this has a name with "load" in it, it doesn't access memory at all. It just computes the PC + offset and stores the result of this operation in the destination register.

ST

Store instruction

The store instruction calculates the address based on PC + the offset specified and stores the value in the specified register into memory at the calculated address.

STI

Store indirect instruction

Refer to LDI and the ST instruction. This calculates an address using the PC and the offset specified and uses the calculated address to retrieve the address of the actual value and stores the value in the register mentioned at the address retrieved. Again, refer to the LDI instruction for a more detailed explanation.

STR

Store using register

Retrieves the address in the mentioned base register and uses that address to store the value in the source register into memory.

BR

Break instruction

The n, z, p are condition codes that are stored in their specific register and are written to whenever a register (GDR) is written to. The BR instruction updates the PC with the offset specified according to the condition codes specified.

Example:

x3000: 0000 110 000000010 ; 2 PC offset if negative or zero
...
x3003: <some instruction>

When line x3000 is executed, first the PC is updated to x3001 during the FETCH phase and during the execution phase, if either the n or the z registers are set to 1 then the PC is incremented by 2. So, assuming one of those 2 conditions are true, the PC would be updated to x3003 so that both x3001 and x3002 would be skipped.

JMP

Jump instruction

This instruction just writes the contents of the specified register to the PC (program counter), this way the next instruction that's executed is at the location specified by the register.

TRAP

The TRAP instruction

The TRAP instruction is a special which, using the trapvect8 which is just a way of specifying an instruction, you can call a subroutine that provided by the operation system such as taking in user input or writing to the screen. The TRAP instruction achieves this by changing the value of the PC to some value based on the instruction specified by the trapvect8 code

In operating system jargon, we say that the TRAP instruction invokes an operation system service call.

Here are some common trap vector codes:

x23: Input a character from the keyboard (Reads the character to register 0).
x21: Output a character to the monitor.
x25: Halt the program.

Assembly Language

As discussed in binary, an instruction consist of an opcode (the operation that has to be performed) and some operands.

In assembly we write the name of the opcode and then comma separated operands to indicate an instruction.

So something like:

0001 011 001 000 010

would be written as

ADD R3, R1, R2

And comments are the same as binary, the rest of the line starting from the semicolon ADD R1, R2, R3 ; comment.

There is an additional thing in assembly, which are pseudo-opcodes. These are codes that are prefixed with a .. These are thing you wish to tell the assembler and is not a real instruction.

You can use numbers in assembly in 2 ways. You can either use hex representation by prefixing a number with x, so x3000, or you can use decimal by prefixing something with the # symbol, so #10 would work.

There's also something called symbols for assembly which you can use to mark statements so that you don't have to do things like calculate offsets for statements like BR. This will be explained under the 'symbols' heading

Example Assembly program

.ORIG x3000
AND R1, R1, #0
AND R2, R2, R1
.END

So let's look at the above program.
The first line .ORIG specifies where the program has to be loaded to, in our case, the program will be loaded to the memory location x3000.

The next 2 statements perform the ADD operation with different operands.

The last statement, .END tells the assembler that this is the end of all the instructions.

Parts of Assembly

Symbols or Labels

    .ORIG
LOOP    AND R1, R1, #0
        ADD R1, R0, R0
        ADD R0, R0, #-1
        BRp LOOP
    .END

Here LOOP is a label that has been provided to the first statement. A label can label any location in memory that you want to reference later. In our case, we use this label in a BR statement to be able to go back a few instructions. This allows us to never have to calculate PC offsets for any of our statements.

Pseudo Opcodes (Assembler Directives)

.ORIG <address>: specify the address that the program has be loaded into in memory.
.FILL <value>: Covert 'value' to a bitmask and store it in memory. This is useful to store literals like #5 or x400 if it's required for some calculation.
.BLKW <number of words>: (Block of words) This tells the assembler to set aside a number of sequential memory locations in the program which we might want to access later.
.STRINGZ "<String>": This sets aside the length of the string +1 memory locations for the string and initializes it with the string provided and NULL terminates it (last bit is just the NULL char in ASCII).
.END: Tells the assembler that it has reached the end of the program and need not look at anything after it.

The 2-pass process

Step One

First, the assembler parses the file and builds a symbol table. It identifies the actual binary addresses corresponding to symbol names. This set of correspondences is known as the symbol table. This table is stored in a separate file with the .sym extension which can be separately loaded into the LC3.

The assembler keeps track of the location assigned to each instruction by means of a location counter (LC). The LC is initialized to the address specified in the .ORIG.

The assembler examines each instruction in sequence and increments the LC once for each assembly language instruction. If the instruction contains a label, a symbol table entry is made for that label, specifying the current contents of LC as its address.

Step Two

In phase two it translates the assembly code into its corresponding machine-level code (binary). It uses the symbol table from the previous step to replace all symbols with either their addresses or the required offset depending on the operation.

An important thing that has to be noted is that if a label is too far away such that the label cannot be reached using the PC offset then the assembler throws an error.

Example (I have a label called "LABEL" that's too far for an LD offset):

STARTING PASS 1
0 errors found in first pass.
STARTING PASS 2
  2: label "LABEL" at distance 347 (allowed range is -256 to 255)
1 errors found in second pass.

Beyond One Program

The Executable Image

When a computer begins the execution of a program, the entity being executed is called an executable image. Each module is translated separately into an object file.

Each object file consists of instructions in the ISA, along with its associated data. Some object files are provided by the operating system as libraries, etc. The final step is to combine (i.e. link) all the object modules together into one executable image.

More than one object file

In the real world, most programs have more than one object file. May be different libraries that are used or different modules are written by multiple programmers.

If you have a label that does not exist in the current file but rather exits in a different file you might want to declare it .EXTERNAL to tell the assembler that this symbol is not going to be present in this file but rather in a different one.

However, the LC-3 DOES NOT have an .EXTERNAL pseudo-opcode.

If it did, then at link time, when all the modules are combined, the linked would use the symbol table entry of the undeclared label in another module to complete the translation of the current module.

Subroutines

Code fragments that can be reused and be packaged into a block and be used by other programs are called subroutines, procedures, or functions (They are not all the same but... they are very similar).

By convention, register R7 is used to keep track of the PC from where the subroutine was called. So, to design a subroutine you would remember that R7 contains the PC from where the subroutine was called and the opcode RET is similar to JMP R7 and resets the PC to the original program counter.

The Call/Return Mechanism

The call/return mechanism allows us to execute a block of code multiple times and not have to code it multiple times. A program can "call" a subroutine and then that block of code can be executed and then control can be "returned" to the main program.

We refer to the program that contains the call as the caller, and the subroutine that contains the return as the callee.

The call/return mechanism consists of two instructions. The first instruction JSR(R) is in the caller program and does two things: It loads the PC with the starting address of the subroutine and it loads R7 with the address immediately after the address of the JSR(R) instruction (This is the address to come back to).

The address we come back to is called the return linkage. The second instruction JMP R7 is the last instruction in the subroutine. It loads the PC with the contents of R7 to head back to the origin program.

JSR(R) Instruction

These are the 2 instructions that are usually used to call a subroutine. The instruction uses one of two addressing modes for the computing the starting address of the subroutine, PC-relative addressing or Base Register addressing.

The LC-3 assembly language provides two different mnemonic names for the opcode, JSR and JSRR, depending on which addressing mode is used.

JSR(R)

15-12 contain 0100 which is the opcode. Bit 11 specifies the addressing mode, the value 1 if PC-relative, and 0 if base register. For PC-relative address bits 10-0 are used. For base register addressing, the register number is stored in bits 8-6

Example of JSRR:

JSRR example

Saving and Restoring Registers

Simple concept, when a subroutine is called, it doesn't know which registers are being used by the calling program, so it just stores the contents of any registers it plans to use and then restore them before returning.

We follow a convention so that no data that is needed gets over-written or lost.

Caller Save

A subroutine is going to specify the registers it expects input in (parameters) (usually R0), the registers it's going to leave values as a return value in, and we know that the JSR instructions writes data to R7.

So, it's up to the program calling this function to make sure it doesn't have anything important in those registers, and if it does, it need to save those to memory or move it to a different register so as to keep it safe.

Example: If a subroutine specifies the following:

Input in R0 and R1
Output in R2

Now, as the program that is about to call this subroutine, we need to have the inputs in R0 and R1 and not have anything important in R2 or R7. If we do have something important in those registers, we need to save those to memory or an unused register.

Callee Save

If in your subroutine (something you are writing) you need to use registers other than the specified registered (registers that YOU specify as input/output registers) then you should save the contents of those registers to memory and then reload them before return control to the caller program.

This is because the caller program doesn't expect the contents of it's registers changed, so better not do so.

I/O

We usually just use TRAP instructions to perform I/O operations in the LC-3. These instructions are just system calls. We leave the handling of devices such as the keyboard and the monitor to the operating system, that way the programmer doesn't have to worry about the particulars.

The "trap vector" is just a reference to memory locations that contain pointers to the respective subroutines. So, each of the vectors that we can specify is just subroutines that the operating system has provided us with.

Privilege and Priority

Privilege

In multi-program computers, we might not want just any program to be able to call the HALT instruction to stop the computer OR we won't want any program to overwrite memory that's being used by the operating system.

We solve this program by having each computer program designated as either privileged or unprivileged. We often say supervisor privilege to indicate privileged.

Priority

Priority is all about the urgency of a program to execute. Every program is assigned a priority, specifying its urgency as compared to all other programs. This allows programs with higher priority to interrupt programs of lesser urgency.

The Processor Status Register

Each program executing on the computer has associated with it two important registers.

The Program Counter (PC)
The Processor Status Register (PSR)

The PSR contains information about a programs privilege and priority.

PSR Register

Bit 15 specifies privilege, 0 is privileged and 1 is not.
Bits 10 to 8 specify the priority. Higher value = higher priority.
The PSR also contains conditions codes used for opcodes like BR.

Organization of Memory

The LC-3 has a 16-bit address space. Locations x0000 to x2FFF are privileged memory locations. The contains the various data structures and code of the operating system. They require supervisor privilege to access. They are referred to as system space.

The locations x3000 to xFDFF are unprivileged memory locations. This region is referred to as user space.

Addresses xFE00 to xFFFF do not correspond to memory locations at all. These address are used to identify registers that take part in input and output functions and some special registers associated with the processor. For example, the PSR is assigned address xFFFC, and the processor's Master Control Register (MCR) is assigned address xFFFE.

The set of addresses from xFE00 to xFFFF is usually referred to as the I/O page since most of the addresses are used for identifying registers that take part in input or output functions. These also require supervisor privilege to access.

Note: The "page" in "I/O page" is called that since memory is usually broken up into "pages" to make it easier to locate and allocate.

Supervisor and User Stacks

Memory organization

There are 2 stacks maintained in memory, a supervisor stack in system space and a user stack in user space. The supervisor stack is controlled by the operating system and the user stack is controlled by the user program.

Each has a stack pointer, Supervisor Stack Pointer (SSP) and the User Stack Pointer (USP) to indicate top of stack. Only one of the two stacks is active at any one time (depending on the privilege of the currently running program). Register 6 is generally used as the stack pointer (SP) for the active stack. Two registers, Saved_SSP and Saved_USP are provided to save the SP not in use.

I/O

The "trap vector" is just reference to memory locations which contain pointers to the respective subroutines. So, each of the vectors that we can specify are just subroutines that the operating system has provided us with.

Every I/O device usually needs at least two registers: one to hold the data being transferred and one to indicate status information about the device.

Memory-mapped I/O vs Special I/O Instructions

You can either have specific instructions in the ISA to handle I/O (new opcodes for each) or you can just use the existing LD and ST with all the different types to refer to locations in memory which would indicate an I/O instruction. The LC-3 (and most computers) use the latter option.

Each device register is assigned an address from memory address space of the ISA. That is, the I/O device registers are mapped to a set of addresses allocated to I/O device registers rather than to memory locations.

Handing The Asynchronous Process

Most I/O is carried out at speeds which are much slower than speed of the processor. This is why I/O operations are asynchronous. To control processing in an asynchronous world requires some protocol or handshaking mechanism.

In the case of the keyboard, we will need a one-bit status register, called a flag, to indicate if someone has or has not typed a character. In the case of the monitor, we will need a one-bit status register to indicate whether or not the most recent character sent to the monitor has been displayed so that the monitor can be given a new character that has to be displayed.

These flags are the simplest form of synchronization. A single flag, called the ready bit, is enough to synchronize the output. Each time the typist types a character, the ready bit is set to 1. Each time the computer reads a character, it clears the ready bit.

If the input were to be at a consistent speed then this handshaking processes would not be needed. If input was being received at the rate of 1 character every 200 mills then the computer would just know that every 200 mills a new character can be read.

Interrupt-Driven vs. Polling

The processor, which is computing, and the typist, who is typing, are two separate entities. The issue of interrupt-driven vs polling is the issue of who controls the interaction. Either the input device interrupts the computer or the computer keeps reading a register (usually the ready bit) until a character is typed.

Keyboard Input

Two registers:

KBDR: Keyboard data register (xFE02).
KBSR: Keyboard status register (xFE00).

Keyboard Registers

Even though a character needs only 8 bits and the synchronize mechanism needs only 1 bit, it's easier to assign 16 bits to each.

When a key is struck, the ASCII code for that key is loaded into KBDR[7:0], and the electronic circuits associated with the keyboard automatically set KSBR[15] to 1. When the LC-3 reads KBDR, the electronic circuits associated with the keyboard automatically clear KBSR[15], allowing another key to be struck. While KBSR[15] is 1, the keyboard is disabled and another key cannot be struck until the previous key is read.

The subroutine is going to look something like this:

START   LDI   R1, KBSR    ; Test for
        BRzp  START       ; character input
        LDI   R0, KBDR
        BRnzp NEXT_TASK   ; Go to next task
KBSR    .FILL xFE00       ; Address of KBSR
KBDR    .FILL xFE02       ; Address of KBDR

Monitor Output

This is very similar to the input only difference is that the program writes to the data register when the flag bit is 1.

DDR: Display Data Register (xFE06).
DSR: Display Status Register (xFE04).

Monitor Registers

This subroutine is going to be very similar to the previous one so I'm not going to bother putting it in here.

TRAP Routines (Service Routines)

TRAP instructions abstract away the I/O operations and let the operating system handle it.

The trap mechanism involves:

A set of service routines executed on behalf of user programs by the operating system.
A table of starting addresses (256 routines). This table is stored in memory locations x0000 to x00FF.
The TRAP instruction; when a user program wishes to have the operating system execute a specific service routine it uses the TRAP instruction.
A linkage back to the user program. The service routine must have a mechanism for returning control to the user program.

NOTE:
TRAP load the data to the supervisor stack since the program about to run is in supervisor mode not because the program running is in supervisor mode.

The TRAP Instruction

Here's a reminder of the format of a TRAP instruction.

TRAP instruction

The EXECUTE phase of the instruction does 3 things:

The PSR and PC are both pushed onto the system stack (supervisor stack). Since the PC was incremented during the FETCH phase of the TRAP instruction's instruction cycle, the return linkage is automatically saved in the PC.
PSR[15] is set to 0, since the service routine is going to require supervisor privilege to execute. PSR[10:8] are left unchanged since the priority of the TRAP routine is the same as the priority of the program that requested it.
The 8 bit trap vector is zero-extended to 16 bits to form an address that corresponds to a location in the Trap Vector Table. The PC is loaded with the address at that location.

The RTI Instruction

The RTI instruction: To Return Control to the calling program is a mechanism for returning control to the calling program once the trap service has finished execution.

RTI Instruction

The RTI instruction pops the top two values on the system stack into the PC and PSR. Since the PC contains the address following the address of the TRAP instruction, control returns to the user program at the correct address.

Finally, once the PSR has been popped off the system stack, PSR[15] must be examined to see whether the processor was running in User mode or Supervisor mode.

The HALT Trap Routine

The RUN latch is ANDed with the crystal oscillator to produce the clock that controls the operation of the computer. If that one-bit latch was cleared, the output of the AND gate would be 0, stopping the clock.

In the LC-3, the RUN latch is bit 15 of the Master Control Register (MSR), which is memory-mapped to location xFFFE.

Interrupts and Interrupt Driven I/O

There are two parts to interrupt-driven I/O:

The mechanism that enables an I/O device to interrupt the processor.
The mechanism that handles the interrupt request.

Causing the Interrupt

The I/O device must want service.
The device must have the right to request service.
The device request must be more urgent than what the processor is currently doing.

Right to request service: This is the interrupt enable bit, which can be set or cleared by the processor (usually by the operating system), depending on whether or not the processor wants to give the I/O device the right to request that service. In most I/O devices, this interrupt enable (IE) bit is part of the device status register. In the KBSR and DSR, the IE bit is bit[14]. The interrupt request signal from the I/O device is the logical AND of the IE bit and the ready bit.

The INT Signal

To stop the processor from continuing execution of its currently running program and service an interrupt request, the INT signal must be asserted.

An interrupt is only caused between 2 instructions since it would be a lot of work if we wanted to interrupt the processor during the execution of an instruction.

Handing the Interrupt Request

Initiate the interrupt.
Service the interrupt.
Return from the interrupt.

The things that need to be recorded to capture the state of the current state of the running process:

The Process Status Register (PSR)
Program Counter (PC)

Both of these are pushed onto the supervisor stack (a stack that the operating system maintains) which are then popped off once the interrupt has been handled.

Supervisor Stack pushed

Since the INT signal was asserted, the processor does not return to the first start of the FETCH phase of the next instruction cycle, but rather beings a sequence of actions to initiate the interrupt. The processor must do two things.

Save the state of the interrupted program so it can pick up where it left off after the requirements of the interrupt have been completed.
Load the state of the higher priority interrupting program so it can start satisfying its request.

Save the state of the Interrupted Program: The state of a program is a snapshot of the contents of all the contents of all the program's resources. It includes all the contents of the memory locations that are a part of the program and the contents of all the general-purpose registers. It also includes the PC and the PCR.

Although many computers save the contents of the general purpose registers, we will not since we will assume that the service routine will always save the contents of any general purpose register that it needs before using it, and then restore it before returning to the interrupted program. The only state information the LC=3 saves are the PC and the PCR.

The LC-3 saves this state information on the supervisor stack in the same way the PC and PSR are saved when a TRAP instruction is executed.

Load the state of the interrupt service routine: Most processors use the mechanism of vectored interrupts. The eight-bit vector is provided by the device that is requesting the processor be interrupted. It is designated INTV.

If the interrupt is taken, the processor expands the 8-bit interrupt vector (INTV) to form a 16-bit address, which is an entry into the Interrupt Vector Table (consists of memory locations x0100 to x01FF), each containing the starting address of an interrupt service routine. The processor loads the PC with the contents of the location in the Interrupt Vector Table corresponding to the address formed by expanding the interrupt vector INTV.

Similar to a TRAP called service routine, an interrupt service routine also calls RTI to return to the main program.

Example Program:

Example interrupt

Contents of supervisor stack at each step:

Example interrupt stack

NOTE:
TRAP and Interrupt load the data to the supervisor stack since the program about to run is in supervisor mode not because the program running is in supervisor mode.

Problem of Interrupt and Polling TOGETHER

This is best explained with an example: Let's consider the case where we are writing a character to the monitor, the code is going to look somewhat like the following:

POLL  LDI  R3, DSR
      BRzp POLL
      STI  R0, DDR

The above code fragment writes a character to the monitor by polling the DSR register to check if the monitor is ready for a write.

Now, let's consider the hypothetical case where an interrupt occurs just before the processing of the STI instruction. The interrupt service routine is going to stop execution of the code and perform some actions. Once the actions are done it's going to return to this state and the next instruction is going to be the STI. However, what if the monitor changed state during the execution of the service routine? This would mean that the monitor the program is going to write to the register even though the monitor isn't ready, inevitably overwriting something.

We can't just disable interrupts when we perform polling since that could potentially disable interrupts for a really long time.

Solution: The solution to this problem is to turn off interrupts for a particular sequence of instructions and turn it back on before every iteration. This way, an interrupt is only going to happen between 2 polling iteration and never in between.

        LDI   R1, PSR
        LD    R2, INTMASK
        AND   R2, R1, R2

POLL    STI   R1, PSR    ; enable interrupts
        STI   R2, PSR    ; disable interrupts
        LDI   R3, DSR
        BRzp  POLL       ; Poll DSR
        STI   R0, DDR    ; Store the character into DDR
        STI   R1, PSR    ; Restore original PSR

INTMASK .FILL  xBFFF
PSR     .FILL  xFFFC
DSR     .FILL  xFE04
DSR     .FILL  xFE06

This way an interrupt would have to wait for the three-instruction sequence to execute, rather than for the entire polling process to complete.

Subroutines Using Stacks

When trying to create subroutines, a lot of the registers are used up for parameters and meta parameters (parameters required to keep track of things (unofficial term)). Since the LC3 has only 8 registers, this causes the programmer to juggle around a lot and this also doesn't allow for recursive calls to the subroutine; recursive calls become a problem since you won't be able to save R7 at some fixed memory location inside the subroutine which would get overwritten on each call.

This issue is solved by using a stack to keep track of each call to the subroutine.

Stack

In this way, each call has a copy of its local variables and the data it's supposed to remember so that there isn't any need to keep track of registers. The only thing you would need to keep track of is the top of stack.

Assembly to C

We can easily abstract a lot of what is done in assembly into a at least a little higher level of programming. Things like multiplication and loops could be make easier with more syntax since we have such a common use for them.

Things that higher level languages introduce (relative to assembly):

Naming things, variables, freed from specifying memory locations.
Hardware independence (some extent) and more portable.
Expressiveness; control patter expressions (if/else).
Readability.
Fundamental checking such as syntax and type checking.

Compilation process

Source code if first compiled into an object file. This object file is just a binary file containing instructions that are a part of the ISA.
When a program consists of multiple source programs, the linker links these programs later while creating the final executable code.

Process Anatomy

Memory layout of each program (coded in C and other common languages).

Process Anatomy

The executable part contains the compiled ISA specific instructions loaded onto memory.
The data part contains thing like global variables such as functions and structs or anything else that has been declared with global scope.
The heap contains memory that the program has complete control over and can ask for memory and has to free it. This is where you usually put stuff like large arrays or structs.
The stack contains stack frames which keep track of the scope and state of each function call. This contains local variables, return locations, return value etc.

Variables in C

Variables are a way to abstract memory locations to the programmer. When you declare something like int flag; the compiler reserves enough memory to store flag (4 bytes). Whenever we use the variable then the required load or store commands are substituted on compiling it.

Exploring Memory

The Activation Record (The Call Stack)

Each time a function is called a stack frame is pushed onto the call stack. This is used to keep track of variables and other call information.

The activation record solves the problem of having to keep track of execution control by pushing each call onto a stack, the call stack.

This also facilitates the following:

Provide a local storage for every instance of a function (allowing recursion).
Transfer execution control to a function and return execution control.
Manage the construction and destruction of the storage.
Send and return data values from the running program.

Parts of Stack Frame

Local storage
Program function variables.
Activation Record
Transferred Data values
Parameters
Return value
Execution Control (Return Location)
Stack frame control data

Heap Memory

Heap memory exists within a process, just below (after) the text (executable code) and global variable storage areas. Memory from the heap can be requested and the starting address is stored in stack variables. Memory from the heap should be returned for future use.

The heap is managed by an allocator, it manages all the free blocks using a linked list (free list). next hold the beginning of the next free block and size holds the number of bytes in data.

Heap memory

The Free List

The free list

Memory requests (malloc()) "break off" pieces of memory and hand a pointer to their data field. There is likely some base size (e.g. 16 bytes) and the actual allocated size will be a multiple.
On free() the memory is linked back to the ordered position within the free list. There may well be some re-organization to regroup adjacent free blocks.

CS 2253
Assembly Language
Assembly to C

Enjoy the notes on this website? Consider supporting me in this adventure in you preferred way: Support me.