System Software and Compiler Design

Explain the SIC machine architecture in detail.

Memory: ⦿ Memory consists of 8-bit bytes; any 3 consecutive bytes form a word (24 bits). All addresses on SIC are byte addresses ⦿ Words are addressed by the location of their lowest numbered byte. ⦿ There are total of 32,768 (215) bytes in the computer memory. Registers: There are five registers, all of which have special uses. Each register is 24 bits in length. Their mnemonics, numbers and uses are given in the following table.

Mnemonic	Number	Special Use
A	0	Accumulator; used for arithmetic operations
X	1	Index register; used for addressing
L	2	Linkage register; JSUB instruction stores the return address in this register
PC	8	Program counter; contains the address of the next instruction to be fetched for execution.
SW	9	Status word; contains a variety of information, including a Condition Code (CC)

Data Formats: ⦿Integers are stored as 24-bit binary numbers. ⦿2’s complement representation is used for negative values. ⦿characters are stored using their 8-bit ASCII codes. ⦿There is no floating-point hardware on the standard version of SIC. Instruction Formats: All machine instructions on the standard version of SIC have the 24-bit format:

8	       1	           15
opcode	       x	           Address
Addressing Modes: There are two addressing modes available, indicated by the setting of the x bit in the instruction.

Mode	Indication	Target Address Calculation
Direct	x=0	TA= address
Indexed	x=1	TA= address + (X)

Parentheses are used to indicate the contents of a register or a memory location. For example, (X) represents the contents of register X. Direct Addressing Mode:

 Example: LDA     TEN		LDA – 00
 opcode	  x	                                 TEN
 0 0 0 0   0 0 0 0	  0	        0 0 1    0 0 0 0    0 0 0 0    0 0 0 0
 Effective Address (EA) = 1000
 Content of the address 1000 is loaded to Accumulator.
 
 
Indexed Addressing Mode: Example: STCH BUFFER, X
 
 opcode	  x	                          BUFFER
 0 0 0 0   0 1 0 0	  1	        0 0 1    0 0 0 0    0 0 0 0    0 0 0 0
Effective Address (EA) = 1000 + [X] = 1000 + content of the index register X The Accumulator content, the character is loaded to the Effective address. Instruction Set: SIC provides, load and store instructions (LDA, LDX, STA, STX, etc.). Integer arithmetic operations: (ADD, SUB, MUL, DIV, etc.). All arithmetic operations involve register A and a word in memory, with the result being left in the register. Two instructions are provided for subroutine linkage. COMP compares the value in register A with a word in memory, this instruction sets a condition code CC to indicate the result. There are conditional jump instructions: (JLT, JEQ, JGT), these instructions test the setting of CC and jump accordingly. JSUB jumps to the subroutine placing the return address in register L, RSUB returns by jumping to the address contained in register L. Input and Output: Input and Output are performed by transferring 1 byte at a time to or from the rightmost 8 bits of register A (accumulator). The Test Device (TD) instruction tests whether the addressed device is ready to send or receive a byte of data. Read Data (RD), Write Data (WD) are used for reading or writing the data.

What are the Different types of assemblers and Explain the features used in assemblers

Machine-Dependent Assembler Features: ➜Instruction formats and addressing modes ➜Program relocation. Instruction formats and Addressing Modes The instruction formats depend on the memory organization and the size of the memory. In SIC machine the memory is byte addressable. Word size is 3 bytes. So the size of the memory is 2^12 bytes. Accordingly it supports only one instruction format. It has only two registers: register A and Index register. Therefore the addressing modes supported by this architecture are direct, indirect, and indexed. Whereas the memory of a SIC/XE machine is. 220 bytes (1 MB). This supports four different types of instruction types, they are:   1 byte instruction   2 byte instruction   3 byte instruction   4 byte instruction Instructions can be: –-- Instructions involving register to register –-- Instructions with one operand in memory, the other in Accumulator (Single operand instruction) --– Extended instruction format Addressing Modes are: –-- Index Addressing(SIC): Opcode m, x –-- Indirect Addressing: Opcode @m –-- PC-relative: Opcode m --– Base relative: Opcode m –-- Immediate addressing: Opcode #c Translations for the Instruction involving Register-Register addressing mode: During pass 1 the registers can be entered as part of the symbol table itself. The value for these registers is their equivalent numeric codes. During pass2, these values are assembled along with the mnemonics object code. If required a separate table can be created with the register names and their equivalent numeric values. Translation involving Register-Memory instructions: In SIC/XE machine there are four instruction formats and five addressing modes. For formats and addressing modes. Among the instruction formats, format -3 and format-4 instructions are Register-Memory type of instruction. Program relocation ➜The actual starting address of the program is not known until load time ➜An object program that contains the information necessary to perform this kind of modification is called a relocatable program ➜No modification is needed: operand is using program-counter relative or base relative addressing ➜The only parts of the program that require modification at load time are those that specified direct (as opposed to relative) addresses ➜Modification record  ⦿Col. 2-7 Starting location of the address field to be modified, relative to the beginning of the program (Hex) ➜Col. 8-9 Length of the address field to be modified, in half-bytes (Hex) Machine independent assembler feature: Literals Symbol-defining statements Expressions Program block Control sections and program linking Literals   Write the value of a constant operand as a part of the instruction that uses it Such an operand is called a literal   Avoid having to define the constant elsewhere in the program and make up a label for it   A literal is identified with the prefix =, which is followed by a specification of the literal value   Examples of literals in the statements:

o45	001A	ENDFIL	LDA	=C’EOF’    032010
o215	1062	WLOOP	TD	=X’05’	   E32011 
  With a literal, the assembler generates the specified value as a constant at some other memory location   The address of this generated constant is used as the target address for the machine instruction   All of the literal operands used in the program are gathered together into one or more literal pools   Normally literals are placed into a pool at the end of the program   A LTORG statement creates a literal pool that contains all of the literal operands used since the previous LTORG   Most assembler recognize duplicate literals: the same literal used in more than one place and store only one copy of the specified data value   LITTAB (literal table): contains the literal name, the operand value and length, and the address assigned to the operand when it is placed in a literal pool Expressions  Assembler allow arithmetic expressions formed according to the normal rules using the operator +, -, *, and /  Individual terms in the expression may be constants, user-defined symbols, or special terms  The most common such special term is the current value of the location counter (designed by *)  Expressions are classified as either absolute expressions or relative expressions Program Block  Program blocks: segments of code that are rearranged within a single object unit  Control sections: segments that are translated into independent object program units  USE indicates which portions of the source program belong to the various blocks Control section  References between control sections are called external references  The assembler generates information for each external reference that will allow the loader to perform the required linking  The EXTDEF (external definition) statement in a control section names symbol, called external symbols, that are define in this section and may be used by other sections  The EXTREF (external reference) statement names symbols that are used in this control section and are defined elsewhere Define record (D) ⦿  Col. 2-7 Name of external symbol defined in this control section ⦿  Col. 8-13 Relative address of symbol within this control section (Hex) ⦿  Col. 14-73 Repeat information in Col. 2-13 for other external symbols Refer record (R) ⦿  Col. 2-7 Name of external symbol referred to in this control section ⦿  Col. 8-73 Names of other external reference symbols Modification record (revised : M) ⦿  Col. 2-7 Starting address of the field to be modified, relative to the beginning of the control section (Hex) ⦿  Col. 8-9 Length of the field to be modified, in half-bytes (Hex) ⦿  Col. 10 Modification flag (+ or -) ⦿  Col. 11-16 External symbol whose value is to be added to or subtracted from the indicated field

What is Program Relocation? Explain the problem associated with it and there solution

Program relocation


Ø The actual starting address of the program is not known until load time


Ø An object program that contains the information necessary to perform this kind of modification is called a relocatable program


Ø No modification is needed: operand is using program-counter relative or base relative addressing


Ø The only parts of the program that require modification at load time are those that specified direct (as opposed to relative) addresses


Ø Modification record


        o Col. 2-7 Starting location of the address field to be modified, relative to the           beginning of the program (Hex)


Col. 8-9  Length of the address field to be modified, in half-bytes (Hex)


Sometimes it is required to load and run several programs at the same time. The system must be able to load these programs wherever there is place in the memory. Therefore the exact starting is not known until the load time.





Absolute Program


In this the address is mentioned during assembling itself. This is called Absolute Assembly.


 


Consider the instruction:


55101B  LDA   THREE    00102D



· This statement says that the register A is loaded with the value stored at location 102D. Suppose it is decided to load and execute the program at location 2000 instead of location 1000.


· Then at address 102D the required value which needs to be loaded in the register A is no more available. The address also gets changed relative to the displacement


of the program. Hence we need to make some changes in the address portion of the instruction so that we can load and execute the program at location 2000.


 

· Apart from the instruction which will undergo a change in their operand address value as the program load address changes. There exist some parts in the program which will remain same regardless of where the program is being loaded.

 

· Since assembler will not know actual location where the program will get loaded, it cannot make the necessary changes in the addresses used in the program. However, the assembler identifies for the loader those parts of the program which need modification.

 

· An object program that has the information necessary to perform this kind of modification is called the relocatable program.

Give the algorithm for pass1 of and 2 pass assembler

 The Algorithm for Pass 1:


Begin

read first input line

if OPCODE = ‘START’ then begin

save #[Operand] as starting addr

initialize LOCCTR to starting address

write line to intermediate file

read next line

end( if START)



      Else
initialize LOCCTR to 0
While OPCODE != ‘END’ do
Begin
if this is not a comment line then
Begin
if there is a symbol in the LABEL field then
Begin
search SYMTAB for LABEL
if found then
set error flag (duplicate symbol)
Else
(if symbol)
search OPTAB for OPCODE
if found then
add 3 (instr length) to LOCCTR
else if OPCODE = ‘WORD’ then
add 3 to LOCCTR
else if OPCODE = ‘RESW’ then
add 3 * #[OPERAND] to LOCCTR
else if OPCODE = ‘RESB’ then


add #[OPERAND] to LOCCTR

else if OPCODE = ‘BYTE’ then

begin

find length of constant in bytes

add length to LOCCTR

end

else

set error flag (invalid operation code)

end (if not a comment)

write line to intermediate file

read next input line

end { while not END}

write last line to intermediate file

Save (LOCCTR – starting address) as program length

End {pass 1}

 


The Algorithm for Pass 2:


Begin

read 1st input  line

if OPCODE = ‘START’ then

begin

write listing line

read next input line

end

write Header record to object program

initialize 1st Text record

while OPCODE != ‘END’ do

begin

if this is not comment line then

begin





search OPTAB for OPCODE
if found then
Begin
if there is a symbol in OPERAND field then
Begin
search SYMTAB for OPERAND field then
if found then
Begin
store symbol value as operand address
Else
Begin
store 0 as operand address
set error flag (undefined symbol)
End
end (if symbol)
else store 0 as operand address
assemble the object code instruction
else if OPCODE = ‘BYTE’ or ‘WORD” then
convert constant to object code
if object code doesn’t fit into current Text record then
Begin



Write text record to object code

initialize new Text record

end

add object code to Text record

end {if not comment}

write listing line

read next input line

end

write listing line

read next input line

write last listing line

End {Pass 2}

Explain with a neat diagram phases of a compiler by taking an example A=B+C*60.

 1) Lexical Analyzer

— The first phase of compiler is lexical analyzer it reads stream of characters in the source program

— Groups the characters into meaningful sequences – lexemes

— For each lexeme, a token is produced as output

—  <token-name , attribute-value>

Token-name : symbol used during syntax analysis

Attribute-value : an entry in the symbol table for this token

— Information from symbol table is needed for syntax analysis and code generation

— Consider the following assignment statement 

2) Syntax Analysis

The second phase of compiler is  syntax analysis  is also  called Parsing

— Parser uses the tokens to create a tree-like intermediate representation

— Depicts the grammatical structure of the token stream

— Syntax tree is one such representation

                                 Interior node – operation

                                 Children  - arguments of the operation

Other phases use this syntax tree to help analyze source program and generate target program


3) Semantic Analysis

The third phase of compiler is Semantic Analyzer

— Checks semantic consistency with language using:

                               Syntax tree  and Information in symbol table

— Gathers type information and save in syntax tree or symbol table

— Type Checks each operator for matching operands

                               Ex: Report error if floating point number is used as index of an array

— Coercions or type conversions

                         Binary arithmetic operator applied to a pair of integers or floating point numbers

                           If applied to floating point and integer, compiler may convert integer to floating-

                            point number


4) Intermediate Code Generation

     After syntax and semantic analysis  Intermediate Code Generation is the fourth  phase of compiler

— Compilers generate machine-like intermediate representation

— This intermediate representation should have the two properties:

                         Should be easy to produce

                         Should be easy to translate into target machine

             Three-address code

— Sequence of assembly-like instructions with three operands per instruction

— Each operand acts like a register

                        Points to be noted about three-address instructions are:

— Each assignment instruction has at most one operator on the right side

— Compiler must generate a temporary name to hold the value computed by a three-address instruction

— Some instructions have fewer than three operands

 

5) Code Optimization

            Attempt to improve the target code

— Faster code, shorter code or target code that consumes less power

                        Optimizer can deduce that

— Conversion of 60 from int to float can be done once  at compile time

— So, the inttofloat can be eliminated by replacing 60 with 60.0

 t3 is used only once to transmit its value to id1


6) Code Generation

— Takes intermediate representation as input

— Maps it into target language

— If target language is machine code

                        Registers or memory locations are selected for each of the variables used

                        Intermediate instructions are translated into sequences of machine instructions

                        performing the same task

— Assignment of registers to hold variables is a crucial aspect

Discuss the various applications of compiler technology

 Applications of Compiler Technology

1. Implementation of high-level programming languages

Ø High-level programming language defines a programming abstraction
Ø Low-level language have more control over computation and produce efficient code
Ø Hard to write, less portable, prone to errors and harder to maintain
Ø Example : register  keyword
Ø Common programming languages (C, Fortran, Cobol) support
Ø User-defined aggregate data types (arrays, structures, control flow )
Ø Data-flow optimizations
Ø Analyze flow of data through the program and remove redundancies
Ø Key ideas behind object oriented languages are Data Abstraction
      and Inheritance of properties
Ø Java has features that make programming easier
Ø Type-safe – an object cannot be used as an object of an unrelated type
Ø Array accesses are checked to ensure that they lie within the bounds
Ø Built in garbage-collection facility
Ø Optimizations developed to overcome the overhead by eliminating unnecessary range checks
2. Optimizations for Computer Architectures

Ø Parallelism
Ø Instruction level : multiple operations are executed simultaneously
Ø Processor level : different threads of the same application run on different processors
Ø Memory hierarchies
Ø Consists of several levels of storage with different speeds and sizes
Ø Average memory access time is reduces
Ø Using registers effectively is the most important problem in optimizing a program
Ø Caches and physical memories are managed by the hardware
Ø Improve effectiveness by changing the layout of data or order of instructions accessing the data
3. Design of new Computer Architectures

Ø RISC (Reduced Instruction-Set Computer)
Ø CISC (Complex Instruction-Set Computer)
Ø Make assembly programming easier
Ø Include complex memory addressing modes
Ø Optimizations reduce these instructions to a small number of simpler operations
Ø PowerPC, SPARC, MIPS, Alpha and PA-RISC
Ø Specialized Architectures
Ø Data flow machines, vector machines, VLIW, SIMD, systolic arrays
Ø Made way into the designs of embedded machines
Ø Entire systems can fit on a single chip
Ø Compiler technology helps to evaluate architectural designs
4. Program Translations
Ø Binary Translation
Ø Translate binary code of one machine to that of another
Ø Allow machine to run programs compiled for another instruction set
Ø Used to increase the availability of software for their machines
Ø Can provide backward compatibility
Ø Hardware synthesis
Ø Hardware designs are described in high-level hardware description languages like Verilog and VHDL
Ø Described at register transfer level (RTL)
Ø Variables represent registers
Ø Expressions represent combinational logic
Ø Tools translate RTL descriptions into gates, which are then mapped to transistors and eventually to physical layout
Ø Database Query Interpreters
Ø Languages are useful in other applications
Ø Query languages like SQL are used to search databases
Ø Queries consist of predicates containing relational and boolean operators
Ø Can be interpreted or compiled into commands to search a database
Ø Compiled Simulation
Ø Simulation Technique used in scientific and engg disciplines,Understand a phenomenon or validate a design
Ø Inputs include description of the design and specific input parameters for that run
5. Software Productivity Tools
Ø Testing is a primary technique for locating errors in a program
Ø Use data flow analysis to locate errors statically
Ø Problem of finding all program errors is undecidable
Ø Ways in which program analysis has improved software productivity
Ø Type Checking
Ø Catch inconsistencies in the program
Ø Operation applied to wrong type of object
Ø Parameters to a procedure do not match the signature
Ø Go beyond finding type errors by analyzing flow of data
Ø If pointer is assigned null and then dereferenced, the program is clearly in error
Ø Bounds Checking
Ø Security breaches are caused by buffer overflows in programs written in C
Ø Data-flow analysis can be used to locate buffer overflows
Ø Failing to identify a buffer overflow may compromise the security of the system
Ø Memory-management tools
Ø Automatic memory management removes all memory-management errors like memory leaks
Tools developed to help programmers find memory management errors

Structure of a Compiler Design - Phases of Compiler

The compilation process is a sequence of various phases. Each phase takes input from its previous stage, has its own representation of source program, and feeds its output to the next phase of the compiler. Let us understand the phases of a compiler.
Lexical Analysis The first phase of scanner works as a text scanner. This phase scans the source code as a stream of characters and converts it into meaningful lexemes. Lexical analyzer represents these lexemes in the form of tokens as:

<token-name, attribute-value> 

Syntax Analysis The next phase is called the syntax analysis or parsing. It takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree). In this phase, token arrangements are checked against the source code grammar, i.e. the parser checks if the expression made by the tokens is syntactically correct.
Semantic Analysis Semantic analysis checks whether the parse tree constructed follows the rules of language. For example, assignment of values is between compatible data types, and adding string to an integer. Also, the semantic analyzer keeps track of identifiers, their types and expressions; whether identifiers are declared before use or not etc. The semantic analyzer produces an annotated syntax tree as an output.
Intermediate Code Generation After semantic analysis the compiler generates an intermediate code of the source code for the target machine. It represents a program for some abstract machine. It is in between the high-level language and the machine language. This intermediate code should be generated in such a way that it makes it easier to be translated into the target machine code.
Code Optimization The next phase does code optimization of the intermediate code. Optimization can be assumed as something that removes unnecessary code lines, and arranges the sequence of statements in order to speed up the program execution without wasting resources (CPU, memory).
Code Generation In this phase, the code generator takes the optimized representation of the intermediate code and maps it to the target machine language. The code generator translates the intermediate code into a sequence of (generally) re-locatable machine code. Sequence of instructions of machine code performs the task as the intermediate code would do.
Symbol Table It is a data-structure maintained throughout all the phases of a compiler. All the identifier's names along with their types are stored here. The symbol table makes it easier for the compiler to quickly search the identifier record and retrieve it. The symbol table is also used for scope management.

Input Buffering in Compiler Design

The lexical analyzer scans the input from left to right one character at a time. It uses two pointers begin ptr(bp) and forward to keep track of the pointer of the input scanned.
Initially both the pointers point to the first character of the input string as shown below The forward ptr moves ahead to search for end of lexeme. As soon as the blank space is encountered, it indicates end of lexeme. In above example as soon as ptr (fp) encounters a blank space the lexeme “int” is identified. The fp will be moved ahead at white space, when fp encounters white space, it ignore and moves ahead. then both the begin ptr(bp) and forward ptr(fp) are set at next token. The input character is thus read from secondary storage, but reading in this way from secondary storage is costly. hence buffering technique is used.A block of data is first read into a buffer, and then second by lexical analyzer. there are two methods used in this context: One Buffer Scheme, and Two Buffer Scheme. These are explained as following below.
  • One Buffer Scheme: In this scheme, only one buffer is used to store the input string but the problem with this scheme is that if lexeme is very long then it crosses the buffer boundary, to scan rest of the lexeme the buffer has to be refilled, that makes overwriting the first of lexeme.
  • Two Buffer Scheme: To overcome the problem of one buffer scheme, in this method two buffers are used to store the input string. the first buffer and second buffer are scanned alternately. when end of current buffer is reached the other buffer is filled. the only problem with this method is that if length of the lexeme is longer than length of the buffer then scanning input cannot be scanned completely. Initially both the bp and fp are pointing to the first character of first buffer. Then the fp moves towards right in search of end of lexeme. as soon as blank character is recognized, the string between bp and fp is identified as corresponding token. to identify, the boundary of first buffer end of buffer character should be placed at the end first buffer. Similarly end of second buffer is also recognized by the end of buffer mark present at the end of second buffer. when fp encounters first eof, then one can recognize end of first buffer and hence filling up second buffer is started. in the same way when second eof is obtained then it indicates of second buffer. alternatively both the buffers can be filled up until end of the input program and stream of tokens is identified. This eof character introduced at the end is calling Sentinel which is used to identify the end of buffer.

Write note on MASM assembler

➜ It supports a wide variety of macro facilities and structured programming idioms, including high-level constructions for looping, procedure calls and alternation (therefore, MASM is an example of a high-level assembler). ➜ MASM is one of the few Microsoft development tools for which there was no separate 16-bit and 32-bit version. ➜ Assembler affords the programmer looking for additional performance a three pronged approach to performance based solutions. ➜ MASM can build very small high performance executable files that are well suited where size and speed matter. ➜ When additional performance is required for other languages, MASM can enhance the performance of these languages with small fast and powerful dynamic link libraries. ➜ For programmers who work in Microsoft Visual C/C++, MASM builds modules and libraries that are in the same format so the C/C++ programmer can build modules or libraries in MASM and directly link them into their own C/C++ programs. This allows the C/C++ programmer to target critical areas of their code in a very efficient and convenient manner, graphics manipulation, games, very high speed data manipulation and processing, parsing at speeds that most programmers have never seen, encryption, compression and any other form of information processing that is processor intensive. ➜ MASM32 has been designed to be familiar to programmers who have already written API based code in Windows. The invoke syntax of MASM allows functions to be called in much the same way as they are called in a high level compiler.

Explain machine dependent features of loader

Machine-Dependent Loader Features

Absolute loader is simple and efficient, but the scheme has potential disadvantages One of the most disadvantage is the programmer has to specify the actual starting address, from where the program to be loaded. This does not create difficulty, if one program to run, but not for several programs. Further it is difficult to use subroutine libraries efficiently.

This needs the design and implementation of a more complex loader. The loader must provide program relocation and linking, as well as simple loading functions.

 

Relocation

The concept of program relocation is, the execution of the object program using any part of the available and sufficient memory. The object program is loaded into memory wherever there is room for it. The actual starting address of the object program is not known until load time. Relocation provides the efficient sharing of the machine with larger memory and when several independent programs are to be run together. It also supports the use of subroutine libraries efficiently. Loaders that allow for program relocation are called relocating loaders or relative loaders.

Methods for specifying relocation

Use of modification record and, use of relocation bit, are the methods available for specifying relocation. In the case of modification record, a modification record M is used in the object program to specify any relocation. In the case of use of relocation bit, each instruction is associated with one relocation bit and, these relocation bits in a Text record is gathered into bit masks.

Modification records are used in complex machines and is also called Relocation and Linkage Directory (RLD) specification. The format of the modification record (M) is as follows. The object program with relocation by Modification records is also shown here.

Modification record

col 1: M
col 2-7: relocation address
col 8-9: length (halfbyte)
col 10: flag (+/-)
col 11-17: segment name

 

HΛCOPY Λ000000 001077

TΛ000000 Λ1DΛ17202DΛ69202DΛ48101036ΛΛ4B105DΛ3F2FECΛ032010

TΛ00001DΛ13Λ0F2016Λ010003Λ0F200DΛ4B10105DΛ3E2003Λ454F46

TΛ001035 Λ1DΛB410ΛB400ΛB440Λ75101000ΛΛ332008Λ57C003ΛB850

TΛ001053Λ1DΛ3B2FEAΛ134000Λ4F0000ΛF1Λ..Λ53C003ΛDF2008ΛB850

TΛ00070Λ07Λ3B2FEFΛ4F0000Λ05

MΛ000007Λ05+COPY

MΛ000014Λ05+COPY

MΛ000027Λ05+COPY

EΛ000000

The relocation bit method is used for simple machines. Relocation bit is 0: no modification is necessary, and is 1: modification is needed. This is specified in the columns 10-12 of text record (T), the format of text record, along with relocation bits is as follows.

Text record

col 1: T

col 2-7: starting address

col 8-9: length (byte)

col 10-12: relocation bits

col 13-72: object code

Twelve-bit mask is used in each Text record (col:10-12 – relocation bits), since each text record contains less than 12 words, unused words are set to 0, and, any value that is to be

modified during relocation must coincide with one of these 3-byte segments. For absolute loader, there are no relocation bits column 10-69 contains object code. The object program with relocation by bit mask is as shown below. Observe FFC - means all ten words are to be modified and, E00 - means first three records are to be modified.

HΛCOPY Λ000000 00107A

TΛ000000Λ1EΛFFCΛ140033Λ481039Λ000036Λ280030Λ300015ΛΛ3C0003 Λ

TΛ00001EΛ15ΛE00Λ0C0036Λ481061Λ080033Λ4C0000ΛΛ000003Λ000000

TΛ001039Λ1EΛFFCΛ040030Λ000030ΛΛ30103FΛD8105DΛ280030Λ...

TΛ001057Λ0AΛ800Λ100036Λ4C0000ΛF1Λ001000

TΛ001061Λ19ΛFE0Λ040030ΛE01079ΛΛ508039ΛDC1079Λ2C0036Λ...

EΛ000000

 Program Linking

The Goal of program linking is to resolve the problems with external references (EXTREF) and external definitions (EXTDEF) from different control sections.

EXTDEF (external definition) - The EXTDEF statement in a control section names symbols, called external symbols, that are defined in this (present) control section and may be used by other sections.

ex: EXTDEF BUFFER, BUFFEND, LENGTH

EXTDEF LISTA, ENDA

EXTREF (external reference) - The EXTREF statement names symbols used in this (present) control section and are defined elsewhere.

ex: EXTREF RDREC, WRREC EXTREF LISTB, ENDB, LISTC, ENDC

 

 

Explain the various phases of the compiler with a simple example

The compilation process is a sequence of various phases. Each phase takes input from the previous, and passes the output on to the next phase.

LEXICAL ANALYSIS

The first phase of compiler works as a text scanner. The lexical analyzer scans the source code as a stream of characters and converts it into meaningful lexemes of the form .

SYNTAX ANALYSIS

It takes the token produced by lexical analysis as input and generates a parse tree (or syntax tree). Token arrangements are checked against the source code grammar.

SEMANTIC ANALYSIS

Checks whether the parse tree constructed follows the rules of language. E.g. assignment of values is between compatible data types, and adding string to an integer. It keeps track of identifiers, their types and expressions. It produces an annotated syntax tree as an output.

INTERMEDIATE CODE GENERATION

After semantic analysis the compiler generates an intermediate code of the source code for the target machine. It represents a program for some abstract machine. It is generated in such a way that it makes it easier to be translated into the target machine code.

CODE OPTIMIZATION

Optimization can be assumed as something that removes unnecessary code lines, and arranges the sequence of statements in order to speed up the program execution without wasting resources. CODE GENERATION This phase takes the optimized representation of the intermediate code and maps it to the target machine language (sequence of re-locatable machine code).

 

Explain Input Buffering. What is the drawback of using one buffer scheme and explain how it is overcome?

We often have to look one or more characters beyond the next lexeme, before we can be sure that we have the right lexeme.

For example, the symbols '<' or '=' could be the beginning of '<<', '<=', ==', etc.

Hence we need two buffer schemes to handle the large lookahead safely. The buffer pairs consist of a lexeme begin pointer (lbp) and a forward pointer (fp).

The lexeme begin pointer marks the beginning of the current lexeme, and the forward pointer scans ahead until a pattern match is found.

This concept is called Input Buffering.

When using a single buffer scheme, if the end of the current file/block is reached before the expression ends, the lexical analyzer might return an unexpected token.

For example, for input string E = M * C ** 2

... ... E = M * C * EOF ... ...
              lbp   ↑ fp  ↑    

 

 

When EOF is reached in the middle of the expression, the lexical analyzer emits '*' as a token; reloads the input buffer, emits the next '*' as another token.

Two '*'s are returned instead of '**', hence invalidating the expression.

This drawback is overcome by using two buffer schemes.

... E = M * C * EOF ...
            lbp ↑     
 
... * 2 EOF ...
    fp  ↑    

When we use two buffers, when the input ends prematurely, the other buffer is loaded with the next file’s contents. Hence, the forward pointer moves to the next file to process the continuing input. The lexical analyzer hence emits '**' as the token.