Reg. No.



## II SEMESTER M.TECH. (SOFTWARE ENGINEERING/COMPUTER NETWORKING AND ENGINEERING)

## END SEMESTER EXAMINATIONS, APRIL/MAY 2017

SUBJECT: PROGRAM ELECTIVE III - PARALLEL COMPUTATION AND

APPLICATIONS [ICT 5241]

## REVISED CREDIT SYSTEM (29/04/2017)

Instructions to Candidates:

Time: 3 Hours

configuration parameters.

ICT 5241

MAX. MARKS: 50

Page 1 of 2

| <ul> <li>Answer ALL the questions.</li> <li>Missing data if any, may be suitably assumed.</li> </ul>                                                                                                                       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| the key features of Kenler GPU architecture.                                                                                                                                                                               | 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| With suitable code snippets, explain how interoperability can be defined as                                                                                                                                                | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| CUDA and Thrust.  With suitable diagrams, explain the parallel computer classifications that are adopted in most of current supercomputers and modern GPUs.                                                                | 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| Write the CUDA kernel to reduce 1D input array of size N using shared memory with least divergence. Given the input array kernel [2, 1, 8, 1, 0, 4, 4, 2, 0, 3, 1, 2, 5, 3, 1, 2] write the execution phases for the above |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|                                                                                                                                                                                                                            | 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| 1 in the OOFF of Nehalem micro-architecture.                                                                                                                                                                               | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| With a neat diagram explain the GOLD of the Can memory be the only limiting factor for parallelism on GPU? Justify your answer with an example.                                                                            | 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| the cooke coherency protocol that is adopted in                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
| With suitable diagram, explain the cache concloney pro-                                                                                                                                                                    | 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| With a suitable example CUDA program, explain how to allocate memory for any                                                                                                                                               | 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| three CUDA variables.                                                                                                                                                                                                      | 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|                                                                                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|                                                                                                                                                                                                                            | 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
|                                                                                                                                                                                                                            | With suitable diagrams, explain the key features of Kepler GPU architecture. With suitable code snippets, explain how interoperability can be achieved between CUDA and Thrust. With suitable diagrams, explain the parallel computer classifications that are adopted in most of current supercomputers and modern GPUs.  Write the CUDA kernel to reduce 1D input array of size N using shared memory with least divergence. Given the input array kernel [2, 1, 8, 1, 0, 4, 4, 2, 0, 3, 1, 2, 5, 3, 1, 2] and configuration parameter <<<4,4>>>>, write the execution phases for the above kernel.  With a neat diagram explain the OOEE of Nehalem micro-architecture. Can memory be the only limiting factor for parallelism on GPU? Justify your answer with an example.  With suitable diagram, explain the cache coherency protocol that is adopted in Nehalem micro-architecture. With a suitable example CUDA program, explain how to allocate memory for any three CUDA variables.  What is thread divergence? Explain how it can be minimized with an example. |

| 4B.<br>4C. | With an example explain the difference between task and data parallelism.  With suitable code snippets, explain how error handling can be done in CUDA C                                                                           | 30   |
|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|
|            | programs.                                                                                                                                                                                                                          | 2    |
| 5A.        | Write the complete CUDA program to perform convolution on 1D input X of dimension N and mask K of dimension M (such that M <n) and="" constant="" memory.<="" shared="" td="" the="" using=""><td>E.S.</td></n)>                   | E.S. |
| 5B.        | Write complete CUDA program using Thrust library to find the division of two input arrays A and B of length N and store the result in C on a GPU (i.e. $\forall$ a $\in$ A and $\forall$ b $\in$ B, c=a/b, only if b is non-zero). | 3    |
| 5C.        | Is it possible to achieve synchronization within the CUDA blocks? Justify your answer                                                                                                                                              | 8    |
|            | with an example.                                                                                                                                                                                                                   | 2    |