



## V SEMESTER B.TECH. (COMPUTER SCIENCE AND ENGINEERING)

## **END SEMESTER EXAMINATIONS, NOV/DEC 2016**

SUBJECT: COMPUTER ARCHITECTURE [CSE 3101]

## REVISED CREDIT SYSTEM (24/11/2016)

Time: 3 Hours MAX. MARKS: 50

## Instructions to Candidates:

- ❖ Answer **ALL** the questions.
- Missing data may be suitable assumed.
- **IA.** With a neat diagram explain how the data will be fetched and executed in the system for the instruction in line number 4.

| Line | Memory Location | Contents of    | Remarks                                       |
|------|-----------------|----------------|-----------------------------------------------|
| No.  | address         | memory         |                                               |
| 1    | aa              | Load Acc, ae   | loads the contents of location to Accumulator |
| 2    | ab              | Add Acc, #4    | Adds 4 to the value in Accumulator            |
| 3    | ac              | Store 7, Acc   |                                               |
| 4    | ad              | Load Acc, [af] |                                               |
|      | ae              | 10             |                                               |
|      | af              | ag             |                                               |
|      | ag              | 10             |                                               |

**4M** 

- **1B.** Explain the different programmatic levels of parallel processing and the architectural configuration of a parallel computer which exploits temporal parallelism.
- **4M**
- **1C.** Derive a formula for efficiency using speed up of a pipelined processor which can execute n tasks in k stages.
- 2M
- **2A.** Consider the following pipelined processor with four stages in figure 2A which has a total evaluation time of seven clock cycles. The pipeline is designed for a function in such a way that the output from even stages are sent to successor even numbered stage and the output from odd numbered stages is sent to its successor stage in successive clock cycles. Write all the formula wherever required.



Figure 2A

- i. Draw the reservation table and identify the forbidden latency for each stage.
- ii. Draw the state transition diagram
- iii. Identify all the greedy cycles
- iv. Calculate the efficiency of the pipeline processor for the cycles which has MAL.

**5M** 

CSE 3101 Page 1 of 2

- **2B.** Two code snippets a) and b) are given below:
  - a) i1) DADDU R2. R3. R4
    - i2) BEQZ R2, L1
    - i3) LW R1,0(R2)
    - i4) L1:

- b) i1) DADDU R1, R2, R3
  - BEQZ R4, L i2)
  - i3) **DSUBU R1, R5, R6**
  - i4) L:....
  - i5) OR R7, R1, R8
- Is it possible to interchange the i2 and i3 in code snippet a)? Justify your answer.
- ii) What are the dependency/dependencies to be maintained for the proper execution of instruction i5 in code snippet b)?

**2M** 

**2C.** Show the Cartesian product representation of space of the interconnection network. Give a detailed explanation of each component in the representation.

**3M** 

3A. Design an algorithm and masking scheme to calculate the product of following numbers using N array processors. Show all the steps clearly.

2,

10.

9.

1

**3M** 

3B. Design a single stage recirculating network in which each processing element is directly connected to (2n-1) processing elements. Select a valid number of processing elements such that 6<N<10, where N is the number of processing elements. Show the connections separately for each routing function.

**3M** 

Design a multistage network that uses inverse shuffle function between the stages. Show 3C. the switch setting to send the data from PE1 to PE7.

**4M** 

Design an algorithm for SIMD matrix multiplication that has a time complexity of O (n log n). Design an improved version of the algorithm that is more space efficient than the previous algorithm in SIMD.

**4M** 

4B. Explain the process of data transfer and exclusive access in directory protocol. **3M** 

- 4C. There are 3 processors in a multiprocessor system as shown in the figure 4C. Explain how the situation is handled in the requesting processor and the snooping processor that uses MESI protocol for the scenario given below. Draw the state transition diagram for both cases.
  - Processor A reads the value of X from its cache. i)
  - ii) Processor B wants to write to Z in its cache.



**3M** 

Figure 4C

5A. With a neat diagram for each, explain the variations of the processors that have hardware for issuing multiple instructions per cycle but only instructions from a single thread are issued in a single cycle.

**3M** 

Explain how load balancing and parallel computation problems are solved in clusters. 5B.

**2M** 

5C. What are the different variables in multicore organization? Explain the general organizations for multicore systems with diagram. Also, list the advantages of having shared L2 cache on the chip over dedicated caches.

**5M** 

CSE 3101 Page 2 of 2