CS 2253
Binary Representation
The way we choose to represent states will depend on the operations
we want to perform and the type of data.
Adding Unsigned Integers in Binary
4 possible operations that may have to be performed:

One's and Two's Complement
For negative numbers, since machines don't have a special character
internally to store negative numbers, we have to find a way to
represent it.
Option 1: Just use MSB to represent a negative value (sign-magnitude).
( just using the first bit to represent either negative or positive. )
- We have a +0 and a -0 (two states for the same value)
Option 2: Using 1's complement (where we just flip all the bits)
- 2 + (-3) and 4 + (-3) become hard operations to perform.
Option 3: 2's complement (where we flip all the bits and add 1)
- Representation where -X + X = 0
In 2's complement,
- Positive values - Normal integer representation
- Negative values - Flip all the bits of the positive value and add one.
To convert from -ve (you know by looking at the first bit of the
number, if that's a one then it's a negative number.), you just flip
all the bits and add 1. This will give you the binary representation
of the positive number with the same value. Then you can convert it to
decimal (if that's the final operation.)
Eg:
2's Complement to Decimal
2's signed int: 1011
-> 0100 (flip all the bits)
-> 0101 (add 1)
-> 5 in decimal.
=> 1011 is -5 (in decimal).
Decimal to 2's Complement
Write -5 in 2 signed in 4 bits.
-> 0101 is +5 in binary in 4 bits.
-> 1010 (flip all bits)
-> 1011 (add 1)
=> 1011 is -5 in binary in 4 bits.
IMPORTANT NOTE: The range for 2's signed complement is
2x-1-1 to -2x-1 where x is the number of bits we
have available.
This is because 0000
would mean 0 but 1000
would mean -8
and not
-0
.
Eg:

Adding -60 and +60 to get 0

Adding Signed Integers

When adding, first convert the decimal to the positive version in
binary. Next, if the number is negative (leads with a 1) then convert
the binary to a negative (flip the bits and add 1). Then just add the
2 numbers. To read the number in the end we convert the binary to
decimal just to check our answer.
For Subtraction just add the negative of the 2nd operand. x - y
is just x + (-y)
.
Sign Extension
Basically, if it's a positive number (The first bit is a 0) then you
can prefix the number with more zeros and it doesn't change the value.
If it's a negative number (The first bit is a 1) then you can prefix
the number with more 1s and it doesn't change the value.
So, in signed int:
010101
= 0000010101
100110
= 1111100110
This is how you extend the numbers if you want them to take up more
space for either storing a larger number or adding it to a larger bit
number.
Hexadecimal and Binary
This is really simple. Each character of hexadecimal maps to 4 bits in
binary. So, f
is the same as 1111
and 1010
is the same as a
.
IEEE754 Floating Point
In IEEE754 Floating-point representation, we use 32-bits to hold a
floating-point value. Here values are stored in scientific notation.
This is when you have a number in the following format:
X.XXXXXXX * xy (where x is the base we are working in)
Example:
- In Decimal: 5.13416 * 103
- In Binary: 1.101101 * 2-5
- The first bit of the 32 represents the sign of the number (so -ve or
+ve). 1 would represent -ve and 0 would represent +ve.
- The next 8 bits hold the exponent of the number as an unsigned
value. This value is going to be 127 + actual exponent of number.
This way we don't have to use a bit to represent the sign of the
exponent.
- The next 23 bits hold the fractional part of the number. Note, we
don't have to hold the leading 1 in 1.101101 since we can just
assume the 1 since this is scientific notation.
Here's how you would convert a number to this representation:

Here are a few examples of converting to and from this form of
representation:

Note about IEEE Floating-point numbers: Since we just assume the preceding 1 for any number since every number other than 0
is in the form 1.xxx
when written in scientific format, this causes a small issue for the number 0
since have the exponent be 127
(aka the value 0) and have all 0
s in the number representation would imply the number 2
and not the number 0
. To solve this problem, the exponent 0
(aka the value -127) is taken as a special exponent where the 1
(and the exponent value is taken to be -126 an it's denormalized) is not assumed and has to be explicitly mentioned. This is done so that 0000 0000 0000 0000 0000 0000 0000 0000
indicates the number 0
(That's 4 bytes).
Here's a IEE 754 calculator
The Von Neumann Model
The von Neumann model consists of five parts: Memory, a processing
unit, input, output, and a control unit. The data the program needs to
carry out the work of the program is either contained int he program's
memory or is obtained from input devices. The order in which the
instructions are carried out is handled by the control unit.

(Source: Introduction to computing systems - Yale Patt)
-
PC: Program counter, this just keeps track of which instruction has
to be executed next.
-
IR: Instruction register, this contains the current instruction that
being processed.
-
MAR: Memory address register, this is used to specify the address
when either trying to read or write to memory.
-
MDR: Memory data register, this is used to either write data to
memory or read data from memory.
-
Register: A register is simply a set of n flip-flops that
collectively are used to store one n-bit value. The PC, IR, MAR and
MDR are all registers that store 16 bits (in the LC3, not all
machines.) of information each.
Memory
Memory is constructed of 2 parts, the address and the data. Each
address usually contains one byte of data. It's just numbered from 0
to x and x is going to be some power of 2. Such as, if you have 16GB
of memory then you have 234 locations of memory.
Reading data
To read the contents of a memory location, we first place the address
of that location in the memory's address register (MAR) and ten
interrogate the computer's memory. The data in that address is going
to be stores in the memory's data register(MDR)
Writing Data
To write data to memory, we first write the address where we want to
store to the MAR and then we write the data that we want to store to
the MDR and then we interrogate the computer's memory with the write
into enable signal asserted. The information contained in the MDR is
going to be written to the location specified by the MAR.
Processing Unit
The processing unit can consist of many complex functional units, each
performing one particular operation (divide, add, square root, etc).
the simplest processing unit, is the ALU. ALU stands for Arithmetic
and Logic Unit. It performs operations like ADD, SUBTRACT, AND, OR,
NOT etc.
The ALU normally processes data elements of a fixed size referred to
as the word length of the computer. The data elements are called
words. Each ISA has its own word length, depending on the intended
use of the computer.
Since reading from memory usually takes longer than performing
something like an ADD operation, almost all provide some small amount
of storage very close to the ALU to allow results to be temporarily
stored if they will be needed to produce additional results in the
near future. The most common form of temporary storage is a set of
registers.
For example, if a computer is to calculate (A + B) * C, it could store
the result of A+B in memory, and then subsequently read it in order to
multiply that result by C. However, this would be slow due to more
number of memory reads. So computers just store the result of A+B in a
temporary storage that's really fast and is really close to the ALU.
Control Unit
This unit is in charge or all the other parts of the computer. The
control unit keeps track of both where we are within a process of
executing the program and where we are in the process of executing
each instruction.
It keeps track of the instruction using the IR (instruction
register) and it keeps track of the next instruction to be processed
using the PC (program counter), which is the address in memory of the next
instruction.
Instruction Processing
The Instruction
The most basic unit of computer processing is the instruction. It is
made of two parts, the opcode (what the instruction does) and the
operands (what it does it to).
There are fundamentally three kinds of instructions:
- Operates: instructions operate on data.
- Data movement: move information from processing unit to and from
memory and to and from input/output devices.
- Control: altering the sequential processing of instructions
(such as loops or conditionals).
- Some ISAs have some special instructions that are necessary for
those ISAs.
The Instruction Cycle
The entire sequence of steps required to process an instruction is
called the instruction cycle. The instruction cycle consists of six
sequential phases, each phase requiring zero or more steps (zero
because not all computers have been designed such that all
instructions require six phases).
The six phases:
FETCH
DECODE
EVALUATE ADDRESS
FETCH OPERANDS
EXECUTE
STORE RESULT
Fetch
Obtains the next instruction in memory and loads it into the
instruction register (IR).
MAR
is loaded with contents of PC
and simultaneously increments
the PC
.
- The memory is interrogated, which results in the next
instruction being placed by memory into the
MDR
.
- The
IR
is loaded with the contents of the MDR
.
Decode
The decode phase examines the instruction in order to figure out what
the microarchitecture is being asked to do. This looks at the
opcode and tries to understand what has to be done.
Evaluate Address
This phase computes the address of memory location that is needed to
process the instruction. So anytime an address is specified in the
instruction, this loads that data into a register. Some operations
doesn't require any data to be read from memory (such as ADD and AND)
and in those cases the evaluate address phase is not needed.
Fetch Operands
This phase obtains the source operands needed to process the
instruction. For example, in the LD
( load instruction ) this phase
would take two steps; loading MAR
with the address calculated in the
EVALUATE ADDRESS phase and reading memory that resulted in the
source operand being placed in MDR
.
In most current microprocessors, this phase (for the ADD instruction)
can be done at the same time the instruction is being executed (the
fifth phase of the instruction cycle). How this is done is not covered
here and is something I look forward to looking into.
Execute
This phase carries out the execution of the instruction.
Store Result
The result is written to its designated destination. In the case of
the ADD instruction, in many computers, this action is performed during
the EXECUTE phase. That is, many computers, an ADD instruction can
fetch its source operands, perform the ADD in the ALU, and store the
result in the destination register all in a single clock cycle. A
separate STORE RESULT phase is not needed.
Coding programs for the LC3
What is the LC3
The LC3 actually just a program written in Java that simulates an
actual computer at the level of the processor. This is similar to that
of Intel x86-64 but in the case of such real architectures there are
many instructions (100s) and more complicated than something like the
LC3 which just has 16 instructions. The LC3 makes learning binary and
assembly language easier.
The LC3 ISA

These are the set of instructions that are used when trying to program
the LC3 computer.
Each of these will be talked about later.
Here's a key in case the chart is too cryptic
- DR: Destination Register.
- SR: Source Register.
- imm5: a signed int value of 5 bits for immediate adding.
- PCoffset9: a signed int value of 9 bits for moving the Program counter.
- BaseR: base register.
- n,z,p (in BR): This check is for the last condition code to check if
it's negative, zero, or positive. Things marked with + modify the
condition code.
- trapvect8: Unsure at this point.
First program
I think its easiest to just talk about how to program the LC3 by
actually introducing a program and then talking about it.
The objective of the program: The program takes one number as an input
which is specified after the halt instruction. If the number is
positive or 0 then it doubles it and if the number is negative it converts
it to positive. Then stores the result at the memory location x3011
.
Here's a flow chart of the program:

And here's the program code ( ;
indicates start of comment )
0011 000 000000000
; Load program into x3000.
; The first line of any program is the address where it has to be loaded.
; So this is just the binary representation of the address x3000
0010 001 000000111
; Load value at memory location into R1 [ LD ]
0000 100 000000010
; If last register number is negative move PC to 2 steps ahead [ BR ]
0001 001 001 0 00 001
; Add R1 to itself and store in R1 [ ADD ]
0000 111 000000010
; Move to end of if else (2 steps ahead) [ BR ]
1001 001 001 111111
; Not value in R1 and store in R1 [ NOT ]
0001 001 001 1 00001
; Add 1 to R1 [ ADD (immediate) ]
0011 001 000001010
; Store value in x03011 [ ST ]
1111 0000 00100101
; Halt [TRAP]
0000 0000 0000 0100
; Input value
- The first line of code specifies to the LC3 that the following
instructions have to be loaded to x3000 in memory.
- The second instruction loads the input value from the end of the
program (after the halt statement) into a register for processing.
This is done by calculating the PC offset that would be required to
load the memory location. The PC at this point would point the
next statement that has to be executed and based on that and the
length of our program we can do some math to figure out how much we
need to add to our PC to get to the input memory location. In our
case, that is the 7, i.e. the input value is 7 memory locations
higher than current PC.
- If we look at the
LD
instruction in the chart that was provided
above, we see that it has a + on the top right of it. This indicates
that this instruction is going to update the condition codes. So, if
the number we load was 0 it would set the z
condition code to 1
and the others to 0. Similarly for n
(for negative) and p
(for
positive). Based on this we need to either double the number or have
it positive. Using the BR
instruction we can direct the computer
to specific instructions based on those condition codes. This is
what implements our if statement. Here's a diagram of how the memory
is going to look.
- The rest is rather self-explainatory in the diagram below.

OR in LC3
Since we don't have a specific instruction for OR
in the LC3 machine
we use De Morgan's laws ( wiki )
to calculate OR from AND and NOT operators.
Instructions
LD input A into R1
LD input B into R2
NOT R1 into R1
NOT R2 into R2
AND R1 and R2 and store in R1
NOT R1 and store in R1
ST R1 to output
Conditionals in LC3
Since instructions are just executed one after another and there isn't
real branching that can be done with just the processor what we do is
just have the true suite be in one place and the false be lower or
higher and based on the BR we can jump to the location of the branch
we wish to execute.

Loops in LC3
We do something very similar to if statements but in this case we just
move the PC to the top of our loop in particular conditions are met.
Do while loop:

While loop (This is a regular while loop like the one in C or Java):

LC-3 ISA Detailed
Overview
Memory Organization
The LC-3 memory has an address space of 2!!^16!! (i.e. 65,536)
locations, and an addressability of 16 bits. Since the normal unit
of data that is processed in the LC-3 is 16 bits, we refer to 16 bits
as one word, and we say the LC-3 is word-addressable (as in you
can reference all memory locations using just one word).
Registers
Since it usually takes far more than one clock cycle to obtain data
from memory, the LC-3 provides (like almost all computers) additional
temporary storage locations that can be accessed in a single clock
cycle. The most common type of temporary storage is the register. Each
register can store 16 bits (in the LC-3).
Registers must be uniquely identifiable. The LC-3 specifies 8 GPRs
(General purpose registers), each identified by a three-bit register
number.
Opcodes
Opcodes are the way to tell a machine what instruction has to be
executed. The Intel x86 ISA has more than 200 opcodes. Other ISAs have
fewer opcodes, it depends on the purpose of the machine. The LC-3 ISA
has 15 opcodes and is specified using 4 bits (We could have had 16
opcodes but the code 1101
has been left unspecified, reserved for
some future need that we are not able to anticipate today).
There are three different types of opcodes: operates, data movement,
and control. Operate opcodes operate on some data, data movement
opcodes load or store data in some form, and control opcodes change
the sequence of instructions that will be executed.
Addressing modes
An addressing mode is a mechanism for specifying where the operand is
located, it can be in one of three places (generally): memory,
register, or part of instruction.
If the operand is a part of the instruction, we refer to it as literal
or immediate operand.
The LC-3 supports 5 addressing modes: immediate, register, and three
memory addressing modes: PC-relative, indirect, and Base+offset.
- Immediate: Instruction contain literal operand.
- Register: A register contains the operand.
- PC-relative: The operand is stored at memory location PC + some
specified offset (usual 9 bit signed int).
- Indirect: You specify a memory location (Using PC + some offset) at
which there will be the address of the operand's memory location.
- Base + offset: You specify a register and some offset to that value
of the register to be used as a memory location.
Condition codes
The LC-3 has three single-bit registers that are individually set or
cleared each time one of the eight general purpose registers is
written into as a result of execution of one of the operate
instruction or one of the load instructions.
The three single-bit registers are called N, Z, and P, corresponding
to their meaning: negative, zero, and positive. Each time a GPR is
written to, one of these 3 are set to 1, corresponding to if the value
being written is negative, zero, or positive respectively.
Operate Instructions
NOTES:
- Registers are specified by a 3 bit number for the LC-3 (0-7).
- DR stands for destination register.
- SR stands for source register.
ADD
There are 2 types of ADD instructions that you could specify.
- Register mode (Add values of 2 registers)
- Immediate mode (Add specified value with another register's value)
Here's what the skeleton of the register mode looks like:

Here, the red bit is specifying that the mode is register mode. This
instruction is going to take the values in the SR1
and SR2
registers and add them and store them into DR
which is the
destination register.
Here's what the skeleton of the immediate mode looks like:

Here, the red bit is specifying that the mode in immediate mode.
This instruction is going to take the value in SR1
and add the value
specified by the green bits. You have 5 bits so your value is going
to be in the range -16 to 15. It will store the result into DR
which
is the destination register.
NOT

The value in the source register is going to be inverted (all 1s to 0s
and vise versa) and the new value is going to be stored in the
destination register.
AND
This specifies a bit-wise and operations between 2 particular values.
There are 2 types of AND instructions that you could specify.
- Register mode (AND values of 2 registers)
- Immediate mode (AND specified value with another register's value)
Here's the skeleton of the Register mode AND operation

Here, the red bit is specifying that the mode is register mode. This
instruction is going to take the values in the SR1
and SR2
registers and perform bit-wise AND on them and store them into DR
which is the destination register.
Here's what the skeleton of the immediate mode looks like:

Here, the red bit is specifying that the mode in immediate mode.
This instruction is going to take the value in SR1
and perform
bit-wise AND on the value specified by the green bits. You have 5
bits so your value is going to be in the range -16 to 15. It will
store the result into DR
which is the destination register.
LD

This loads (reads a value from a memory location) the memory
location at the address specified by the PC (program counter) + the
offset specified in the instruction. The offset is a 2's complement
signed integer.
Since we have 9-bits, we can access memory locations from PC - 256 to
PC + 255.
Example:
x3001: 0010 001 000000110
The above instruction is going to load whatever is at memory location
x3008
into register 1. We got x3008
since we know that the PC is
going to contain the address of the next instruction (which would be
x3002
) and we've specified an offset of 6 (decimal).
LDI

Load indirect
It's calculates the address the same way as LD
does and the uses the
value at the address as an address which then contains the value we
want to load.
PC + offset -> Address -> Address -> value
Example:
x3001: 1010 001 000000001
x3002: ...
x3003: 0011 0000 0000 0101 ; x3005 in binary
x3004: ...
x3005: 0000 0000 0000 0001 ; 1 in binary
So, first the LDI
instruction is going to calculate that the address
is PC + offset
, which in our case is x3002 + 1
which results in
x3003
. At that location, is the value x3005
and LDI
is going to
treat that as an address and go to x3005
and load the value located
there (1
) into register 1.
So in the end, register 1 would contain the value 1
LDR

Load from register
LDR
does what LD
does but rather than using the PC as the
register, it uses the specified register. So, it's going to add the
offset specified to the value in the register which is specified and
load the value located at that memory location into the destination
register.
LEA

Load effective address
Although this has a name with "load" in it, it doesn't access memory
at all. It just computes the PC + offset
and stores the result of
this operation in the destination register.
ST

The store instruction calculates the address based on PC + the offset
specified and stores the value in the specified register into memory
at the calculated address.
STI

Refer to LDI and the ST instruction. This calculates an address using
the PC and the offset specified and uses the calculated address to
retrieve the address of the actual value and stores the value in the
register mentioned at the address retrieved. Again, refer to the LDI
instruction for a more detailed explanation.
STR

Retrieves the address in the mentioned base register and uses that address
to store the value in the source register into memory.
BR

The n
, z
, p
are condition codes that are stored in their
specific register and are written to whenever a register (GDR) is
written to. The BR instruction updates the PC with the offset
specified according to the condition codes specified.
Example:
x3000: 0000 110 000000010 ; 2 PC offset if negative or zero
...
x3003: <some instruction>
When line x3000
is executed, first the PC is updated to x3001
during the FETCH phase and during the execution phase, if either
the n
or the z
registers are set to 1
then the PC is incremented
by 2
. So, assuming one of those 2 conditions are true, the PC would
be updated to x3003
so that both x3001
and x3002
would be
skipped.
JMP

This instruction just writes the contents of the specified register to
the PC (program counter), this way the next instruction that's
executed is at the location specified by the register.
TRAP

The TRAP instruction is a special which, using the trapvect8
which
is just a way of specifying an instruction, you can call a subroutine
that provided by the operation system such as taking in user input or
writing to the screen. The TRAP instruction achieves this by changing
the value of the PC to some value based on the instruction specified
by the trapvect8
code
In operating system jargon, we say that the TRAP instruction invokes
an operation system service call.
Here are some common trap vector codes:
- x23: Input a character from the keyboard (Reads the character to
register 0).
- x21: Output a character to the monitor.
- x25: Halt the program.
Assembly Language
As discussed in binary, an instruction consist of an opcode (the
operation that has to be performed) and some operands.
In assembly we write the name of the opcode and then comma separated
operands to indicate an instruction.
So something like:
would be written as
And comments are the same as binary, the rest of the line starting
from the semicolon ADD R1, R2, R3 ; comment
.
There is an additional thing in assembly, which are pseudo-opcodes.
These are codes that are prefixed with a .
. These are thing you wish
to tell the assembler and is not a real instruction.
You can use numbers in assembly in 2 ways. You can either use hex
representation by prefixing a number with x
, so x3000
, or you can
use decimal by prefixing something with the #
symbol, so #10
would
work.
There's also something called symbols for assembly which you can use
to mark statements so that you don't have to do things like calculate
offsets for statements like BR
. This will be explained under the
'symbols' heading
Example Assembly program
.ORIG x3000
AND R1, R1, #0
AND R2, R2, R1
.END
So let's look at the above program.
The first line .ORIG
specifies where the program has to be loaded to, in our
case, the program will be loaded to the memory location x3000
.
The next 2 statements perform the ADD operation with different
operands.
The last statement, .END
tells the assembler that this is the end of
all the instructions.
Parts of Assembly
Symbols or Labels
.ORIG
LOOP AND R1, R1, #0
ADD R1, R0, R0
ADD R0, R0, #-1
BRp LOOP
.END
Here LOOP
is a label that has been provided to the first statement.
A label can label any location in memory that you want to reference later.
In our case, we use this label in a BR
statement to be able to go back
a few instructions. This allows us to never have to calculate PC
offsets for any of our statements.
Pseudo Opcodes (Assembler Directives)
.ORIG <address>
: specify the address that the program has be loaded
into in memory.
.FILL <value>
: Covert 'value' to a bitmask and store it in memory.
This is useful to store literals like #5
or x400
if it's
required for some calculation.
.BLKW <number of words>
: (Block of words) This tells the
assembler to set aside a number of sequential memory locations in
the program which we might want to access later.
.STRINGZ "<String>"
: This sets aside the length of the string +1
memory locations for the string and initializes it with the string
provided and NULL
terminates it (last bit is just the NULL char in
ASCII).
.END
: Tells the assembler that it has reached the end of the
program and need not look at anything after it.
The 2-pass process
Step One
First, the assembler parses the file and builds a symbol table. It
identifies the actual binary addresses corresponding to symbol names.
This set of correspondences is known as the symbol table
. This
table is stored in a separate file with the .sym
extension which can
be separately loaded into the LC3.
The assembler keeps track of the location assigned to each instruction
by means of a location counter (LC). The LC is initialized to the
address specified in the .ORIG.
The assembler examines each instruction in sequence and increments the
LC once for each assembly language instruction. If the instruction
contains a label, a symbol table entry is made for that label,
specifying the current contents of LC as its address.
Step Two
In phase two it translates the assembly code into its corresponding
machine-level code (binary). It uses the symbol table from the
previous step to replace all symbols with either their addresses or
the required offset depending on the operation.
An important thing that has to be noted is that if a label is too far
away such that the label cannot be reached using the PC offset then
the assembler throws an error.
Example (I have a label called "LABEL" that's too far for an LD offset):
STARTING PASS 1
0 errors found in first pass.
STARTING PASS 2
2: label "LABEL" at distance 347 (allowed range is -256 to 255)
1 errors found in second pass.
Beyond One Program
The Executable Image
When a computer begins the execution of a program, the entity being
executed is called an executable image. Each module is translated
separately into an object file.
Each object file consists of instructions in the ISA, along with its
associated data. Some object files are provided by the operating
system as libraries, etc. The final step is to combine (i.e. link
)
all the object modules together into one executable image.
More than one object file
In the real world, most programs have more than one object file. May be
different libraries that are used or different modules are written by
multiple programmers.
If you have a label that does not exist in the current file but rather
exits in a different file you might want to declare it .EXTERNAL
to
tell the assembler that this symbol is not going to be present in this
file but rather in a different one.
However, the LC-3 DOES NOT have an .EXTERNAL
pseudo-opcode.
If it did, then at link time, when all the modules are combined, the
linked would use the symbol table entry of the undeclared label in
another module to complete the translation of the current module.
Subroutines
Code fragments that can be reused and be packaged into a block and be
used by other programs are called subroutines, procedures, or
functions (They are not all the same but... they are very similar).
By convention, register R7 is used to keep track of the PC from where
the subroutine was called. So, to design a subroutine you would
remember that R7 contains the PC from where the subroutine was called
and the opcode RET
is similar to JMP R7
and resets the PC to the
original program counter.
The Call/Return Mechanism
The call/return mechanism allows us to execute a block of code multiple
times and not have to code it multiple times. A program can "call" a
subroutine and then that block of code can be executed and then
control can be "returned" to the main program.
We refer to the program that contains the call as the caller, and
the subroutine that contains the return as the callee.
The call/return mechanism consists of two instructions. The first
instruction JSR(R)
is in the caller program and does two things: It
loads the PC with the starting address of the subroutine and it loads
R7
with the address immediately after the address of the JSR(R)
instruction (This is the address to come back to).
The address we come back to is called the return linkage. The second
instruction JMP R7
is the last instruction in the subroutine. It
loads the PC with the contents of R7 to head back to the origin
program.
JSR(R) Instruction
These are the 2 instructions that are usually used to call a
subroutine. The instruction uses one of two addressing modes for the
computing the starting address of the subroutine, PC-relative
addressing or Base Register addressing.
The LC-3 assembly language provides two different mnemonic names for
the opcode, JSR and JSRR, depending on which addressing mode is used.

15-12 contain 0100
which is the opcode. Bit 11 specifies the
addressing mode, the value 1 if PC-relative, and 0 if base register.
For PC-relative address bits 10-0 are used. For base register
addressing, the register number is stored in bits 8-6
Example of JSRR:

Saving and Restoring Registers
Simple concept, when a subroutine is called, it doesn't know which
registers are being used by the calling program, so it just stores the
contents of any registers it plans to use and then restore them
before returning.
We follow a convention so that no data that is needed gets
over-written or lost.
Caller Save
A subroutine is going to specify the registers it expects input in
(parameters) (usually R0
), the registers it's going to leave values
as a return value in, and we know that the JSR
instructions
writes data to R7
.
So, it's up to the program calling this function to make sure it
doesn't have anything important in those registers, and if it does, it
need to save those to memory or move it to a different register so as
to keep it safe.
Example: If a subroutine specifies the following:
- Input in R0 and R1
- Output in R2
Now, as the program that is about to call this subroutine, we need to
have the inputs in R0 and R1 and not have anything important in R2 or
R7. If we do have something important in those registers, we need to
save those to memory or an unused register.
Callee Save
If in your subroutine (something you are writing) you need to use
registers other than the specified registered (registers that YOU
specify as input/output registers) then you should save the contents
of those registers to memory and then reload them before return
control to the caller program.
This is because the caller program doesn't expect the contents of it's
registers changed, so better not do so.
I/O
We usually just use TRAP
instructions to perform I/O operations in
the LC-3. These instructions are just system calls. We leave the
handling of devices such as the keyboard and the monitor to the
operating system, that way the programmer doesn't have to worry about
the particulars.
The "trap vector" is just a reference to memory locations that contain
pointers to the respective subroutines. So, each of the vectors that
we can specify is just subroutines that the operating system has
provided us with.
Privilege and Priority
Privilege
In multi-program computers, we might not want just any program to be
able to call the HALT
instruction to stop the computer OR we won't
want any program to overwrite memory that's being used by the
operating system.
We solve this program by having each computer program designated as
either privileged or unprivileged.
We often say supervisor privilege to indicate privileged.
Priority
Priority is all about the urgency of a program to execute. Every
program is assigned a priority, specifying its urgency as compared to
all other programs. This allows programs with higher priority to
interrupt programs of lesser urgency.
The Processor Status Register
Each program executing on the computer has associated with it two
important registers.
- The Program Counter (PC)
- The Processor Status Register (PSR)
The PSR
contains information about a programs privilege and
priority.

- Bit 15 specifies privilege, 0 is privileged and 1 is not.
- Bits 10 to 8 specify the priority. Higher value = higher priority.
- The PSR also contains conditions codes used for opcodes like
BR
.
Organization of Memory
The LC-3 has a 16-bit address space. Locations x0000
to x2FFF
are
privileged memory locations. The contains the various data structures
and code of the operating system. They require supervisor privilege to
access. They are referred to as system space.
The locations x3000
to xFDFF
are unprivileged memory locations.
This region is referred to as user space.
Addresses xFE00
to xFFFF
do not correspond to memory locations at
all. These address are used to identify registers that take part in
input and output functions and some special registers associated with
the processor. For example, the PSR
is assigned address xFFFC
, and
the processor's Master Control Register (MCR) is assigned address
xFFFE
.
The set of addresses from xFE00
to xFFFF
is usually referred to as
the I/O page since most of the addresses are used for identifying
registers that take part in input or output functions. These also
require supervisor privilege to access.
Note: The "page" in "I/O page" is called that since memory is usually
broken up into "pages" to make it easier to locate and allocate.
Supervisor and User Stacks

There are 2 stacks maintained in memory, a supervisor stack in system
space and a user stack in user space. The supervisor stack is
controlled by the operating system and the user stack is controlled by
the user program.
Each has a stack pointer, Supervisor Stack Pointer (SSP) and the User
Stack Pointer (USP) to indicate top of stack. Only one of the two
stacks is active at any one time (depending on the privilege of the
currently running program). Register 6 is generally used as the stack
pointer (SP) for the active stack. Two registers, Saved_SSP and
Saved_USP are provided to save the SP not in use.
I/O
We usually just use TRAP
instructions to perform I/O operations in
the LC-3. These instructions are just system calls. We leave the
handling of devices such as the keyboard and the monitor to the
operating system, that way the programmer doesn't have to worry about
the particulars.
The "trap vector" is just reference to memory locations which contain
pointers to the respective subroutines. So, each of the vectors that
we can specify are just subroutines that the operating system has
provided us with.
Every I/O device usually needs at least two registers: one to hold the
data being transferred and one to indicate status information about
the device.
Memory-mapped I/O vs Special I/O Instructions
You can either have specific instructions in the ISA to handle I/O
(new opcodes for each) or you can just use the existing LD
and ST
with all the different types to refer to locations in memory which
would indicate an I/O instruction. The LC-3 (and most computers) use
the latter option.
Each device register is assigned an address from memory address space
of the ISA. That is, the I/O device registers are mapped to a set of
addresses allocated to I/O device registers rather than to memory
locations.
Handing The Asynchronous Process
Most I/O is carried out at speeds which are much slower than speed of
the processor. This is why I/O operations are asynchronous. To control
processing in an asynchronous world requires some protocol or
handshaking mechanism.
In the case of the keyboard, we will need a one-bit status register,
called a flag, to indicate if someone has or has not typed a
character. In the case of the monitor, we will need a one-bit status
register to indicate whether or not the most recent character sent to
the monitor has been displayed so that the monitor can be given a new
character that has to be displayed.
These flags are the simplest form of synchronization. A single flag,
called the ready bit, is enough to synchronize the output. Each time
the typist types a character, the ready bit is set to 1. Each time the
computer reads a character, it clears the ready bit.
If the input were to be at a consistent speed then this handshaking
processes would not be needed. If input was being received at the rate
of 1 character every 200 mills then the computer would just know that
every 200 mills a new character can be read.
Interrupt-Driven vs. Polling
The processor, which is computing, and the typist, who is typing, are
two separate entities. The issue of interrupt-driven vs polling is the
issue of who controls the interaction. Either the input device
interrupts the computer or the computer keeps reading a register
(usually the ready bit) until a character is typed.
Two registers:
KBDR
: Keyboard data register (xFE02
).
KBSR
: Keyboard status register (xFE00
).

Even though a character needs only 8 bits and the synchronize
mechanism needs only 1 bit, it's easier to assign 16 bits to each.
When a key is struck, the ASCII code for that key is loaded into
KBDR[7:0]
, and the electronic circuits associated with the keyboard
automatically set KSBR[15]
to 1. When the LC-3 reads KBDR
, the
electronic circuits associated with the keyboard automatically clear
KBSR[15]
, allowing another key to be struck. While KBSR[15]
is 1,
the keyboard is disabled and another key cannot be struck until the
previous key is read.
The subroutine is going to look something like this:
START LDI R1, KBSR ; Test for
BRzp START ; character input
LDI R0, KBDR
BRnzp NEXT_TASK ; Go to next task
KBSR .FILL xFE00 ; Address of KBSR
KBDR .FILL xFE02 ; Address of KBDR
Monitor Output
This is very similar to the input only difference is that the program
writes to the data register when the flag bit is 1.
DDR
: Display Data Register (xFE06
).
DSR
: Display Status Register (xFE04
).

This subroutine is going to be very similar to the previous one so I'm
not going to bother putting it in here.
TRAP Routines (Service Routines)
TRAP instructions abstract away the I/O operations and let the
operating system handle it.
The trap mechanism involves:
- A set of service routines executed on behalf of user programs by the
operating system.
- A table of starting addresses (256 routines). This table is stored
in memory locations
x0000
to x00FF
.
- The
TRAP
instruction; when a user program wishes to have the
operating system execute a specific service routine it uses the TRAP
instruction.
- A linkage back to the user program. The service routine must have a
mechanism for returning control to the user program.
NOTE:
TRAP load the data to the supervisor stack since the
program about to run is in supervisor mode not because the program
running is in supervisor mode.
The TRAP Instruction
Here's a reminder of the format of a TRAP
instruction.

The EXECUTE phase of the instruction does 3 things:
- The
PSR
and PC
are both pushed onto the system stack (supervisor
stack). Since the
PC was incremented during the FETCH phase of the TRAP
instruction's instruction cycle, the return linkage is automatically
saved in the PC.
PSR[15]
is set to 0, since the service routine is going to require
supervisor privilege to execute. PSR[10:8]
are left unchanged
since the priority of the TRAP
routine is the same as the priority
of the program that requested it.
- The 8 bit trap vector is zero-extended to 16 bits to form an address
that corresponds to a location in the Trap Vector Table. The PC is
loaded with the address at that location.
The RTI Instruction
The RTI
instruction: To Return Control to the calling program is a
mechanism for returning control to the calling program once the trap
service has finished execution.

The RTI
instruction pops the top two values on the system stack into
the PC
and PSR
. Since the PC
contains the address following the
address of the TRAP instruction, control returns to the user program
at the correct address.
Finally, once the PSR
has been popped off the system stack,
PSR[15]
must be examined to see whether the processor was running in
User mode or Supervisor mode.
The HALT Trap Routine
The RUN
latch is AND
ed with the crystal oscillator to produce the
clock that controls the operation of the computer. If that one-bit
latch was cleared, the output of the AND gate would be 0, stopping the
clock.
In the LC-3, the RUN
latch is bit 15
of the Master Control
Register (MSR
), which is memory-mapped to location xFFFE
.
Interrupts and Interrupt Driven I/O
There are two parts to interrupt-driven I/O:
- The mechanism that enables an I/O device to interrupt the processor.
- The mechanism that handles the interrupt request.
Causing the Interrupt
- The I/O device must want service.
- The device must have the right to request service.
- The device request must be more urgent than what the processor is
currently doing.
Right to request service: This is the interrupt enable bit, which can
be set or cleared by the processor (usually by the operating system),
depending on whether or not the processor wants to give the I/O device
the right to request that service. In most I/O devices, this interrupt
enable (IE) bit is part of the device status register. In the KBSR
and DSR
, the IE bit is bit[14]. The interrupt request signal from
the I/O device is the logical AND of the IE bit and the ready bit.
The INT Signal
To stop the processor from continuing execution of its currently
running program and service an interrupt request, the INT signal must
be asserted.
An interrupt is only caused between 2 instructions since it would be a
lot of work if we wanted to interrupt the processor during the
execution of an instruction.
Handing the Interrupt Request
- Initiate the interrupt.
- Service the interrupt.
- Return from the interrupt.
The things that need to be recorded to capture the state of the
current state of the running process:
- The Process Status Register (PSR)
- Program Counter (PC)
Both of these are pushed onto the supervisor stack (a stack that the
operating system maintains) which are then popped off once the
interrupt has been handled.

Since the INT signal was asserted, the processor does not return to
the first start of the FETCH phase of the next instruction cycle, but
rather beings a sequence of actions to initiate the interrupt. The
processor must do two things.
- Save the state of the interrupted program so it can pick up where it
left off after the requirements of the interrupt have been
completed.
- Load the state of the higher priority interrupting program so it can
start satisfying its request.
Save the state of the Interrupted Program: The state of a program
is a snapshot of the contents of all the contents of all the program's
resources. It includes all the contents of the memory locations that
are a part of the program and the contents of all the general-purpose
registers. It also includes the PC
and the PCR
.
Although many computers save the contents of the general purpose
registers, we will not since we will assume that the service routine
will always save the contents of any general purpose register that it
needs before using it, and then restore it before returning to the
interrupted program. The only state information the LC=3 saves are the
PC
and the PCR
.
The LC-3 saves this state information on the supervisor stack in the
same way the PC
and PSR
are saved when a TRAP
instruction is
executed.
Load the state of the interrupt service routine: Most processors
use the mechanism of vectored interrupts. The eight-bit vector is
provided by the device that is requesting the processor be
interrupted. It is designated INTV
.
If the interrupt is taken, the processor expands the 8-bit interrupt
vector (INTV
) to form a 16-bit address, which is an entry into the
Interrupt Vector Table (consists of memory locations x0100
to
x01FF
), each containing the starting address of an interrupt service
routine. The processor loads the PC with the contents of the location
in the Interrupt Vector Table corresponding to the address formed by
expanding the interrupt vector INTV
.
Similar to a TRAP
called service routine, an interrupt service
routine also calls RTI
to return to the main program.
Example Program:

Contents of supervisor stack at each step:

NOTE:
TRAP and Interrupt load the data to the supervisor stack since the
program about to run is in supervisor mode not because the program
running is in supervisor mode.
Problem of Interrupt and Polling TOGETHER
This is best explained with an example: Let's consider the case where
we are writing a character to the monitor, the code is going to look
somewhat like the following:
POLL LDI R3, DSR
BRzp POLL
STI R0, DDR
The above code fragment writes a character to the monitor by polling
the DSR
register to check if the monitor is ready for a write.
Now, let's consider the hypothetical case where an interrupt occurs
just before the processing of the STI
instruction. The interrupt
service routine is going to stop execution of the code and perform
some actions. Once the actions are done it's going to return to this
state and the next instruction is going to be the STI
. However, what
if the monitor changed state during the execution of the service
routine? This would mean that the monitor the program is going to
write to the register even though the monitor isn't ready, inevitably
overwriting something.
We can't just disable interrupts when we perform polling since that
could potentially disable interrupts for a really long time.
Solution: The solution to this problem is to turn off interrupts
for a particular sequence of instructions and turn it back on before
every iteration. This way, an interrupt is only going to happen
between 2 polling iteration and never in between.
LDI R1, PSR
LD R2, INTMASK
AND R2, R1, R2
POLL STI R1, PSR ; enable interrupts
STI R2, PSR ; disable interrupts
LDI R3, DSR
BRzp POLL ; Poll DSR
STI R0, DDR ; Store the character into DDR
STI R1, PSR ; Restore original PSR
INTMASK .FILL xBFFF
PSR .FILL xFFFC
DSR .FILL xFE04
DSR .FILL xFE06
This way an interrupt would have to wait for the three-instruction
sequence to execute, rather than for the entire polling process to
complete.
Subroutines Using Stacks
When trying to create subroutines, a lot of the registers are used up
for parameters and meta parameters (parameters required to keep track
of things (unofficial term)). Since the LC3 has only 8 registers, this
causes the programmer to juggle around a lot and this also doesn't
allow for recursive calls to the subroutine; recursive calls become a
problem since you won't be able to save R7 at some fixed memory
location inside the subroutine which would get overwritten on each
call.
This issue is solved by using a stack to keep track of each call to
the subroutine.

In this way, each call has a copy of its local variables and the data
it's supposed to remember so that there isn't any need to keep track
of registers. The only thing you would need to keep track of is the
top of stack.
Assembly to C
We can easily abstract a lot of what is done in assembly into a at
least a little higher level of programming. Things like multiplication
and loops could be make easier with more syntax since we have such a
common use for them.
Things that higher level languages introduce (relative to assembly):
- Naming things, variables, freed from specifying memory locations.
- Hardware independence (some extent) and more portable.
- Expressiveness; control patter expressions (if/else).
- Readability.
- Fundamental checking such as syntax and type checking.
Compilation process
-
Source code if first compiled into an object file. This object file
is just a binary file containing instructions that are a part of the
ISA.
-
When a program consists of multiple source programs, the linker
links these programs later while creating the final executable code.
Process Anatomy
Memory layout of each program (coded in C and other common languages).

- The executable part contains the compiled ISA specific instructions
loaded onto memory.
- The data part contains thing like global variables such as functions
and structs or anything else that has been declared with global
scope.
- The heap contains memory that the program has complete control over
and can ask for memory and has to free it. This is where you usually
put stuff like large arrays or structs.
- The stack contains stack frames which keep track of the scope and
state of each function call. This contains local variables, return
locations, return value etc.
Variables in C
Variables are a way to abstract memory locations to the programmer.
When you declare something like int flag;
the compiler reserves
enough memory to store flag
(4 bytes). Whenever we use the variable
then the required load or store commands are substituted on compiling
it.
Exploring Memory
The Activation Record (The Call Stack)
Each time a function is called a stack frame is pushed onto the call
stack. This is used to keep track of variables and other call
information.
The activation record solves the problem of having to keep track of
execution control by pushing each call onto a stack, the call stack.
This also facilitates the following:
- Provide a local storage for every instance of a function (allowing
recursion).
- Transfer execution control to a function and return execution
control.
- Manage the construction and destruction of the storage.
- Send and return data values from the running program.
Parts of Stack Frame
Heap Memory
Heap memory exists within a process, just below (after) the text
(executable code) and global variable storage areas. Memory from the
heap can be requested and the starting address is stored in stack
variables. Memory from the heap should be returned for future use.
The heap is managed by an allocator, it manages all the free blocks
using a linked list (free list). next
hold the beginning of the next
free block and size
holds the number of bytes in data.

The Free List

-
Memory requests (malloc()
) "break off" pieces of memory and hand a pointer
to their data field. There is likely some base size (e.g. 16 bytes)
and the actual allocated size will be a multiple.
-
On free()
the memory is linked back to the ordered position within
the free list. There may well be some re-organization to regroup
adjacent free blocks.