Friday, November 23, 2007

Assignment Parallel Programming


Long questions

Q1. Explain parallel reduction with an example.

Given a set of n values a0, a1,…….an-1 and an associative operator ○, reduction is the process of computing a0 ○ a1 ○ an-1. Addition, multiplication, and finding the maximum/minimum of a set are examples of associative operators.
The process of reduction is viewed as a tree-structured operation. The elements of input to the reduction problem are placed at the leaf nodes of a binary tree. The reduction operation is applied to children of each parent node, and the result is propagated towards the root of the tree.
Parallel summation is an example of a reduction operation.







PRAM algorithm to sum n elements using ën/2û processors

SUM :
/*Initial condition: List of n ³ 1 elements stored in A[0…(n-1)]
Final condition: Sum of elements stored in A[0] */
Global variables: n,A[0…(n-1)],j
Begin
spawn(P0, P1, P2,…,P└n/2┘- 1)
for all Pi ,where 0 £ i £ ën/2û -1 do
for j ¬ 0 to log én –1ù do
if i modulo 2j and 2i + 2j < n then
A[2i] ¬A[2i] + A[2i + 2j]
endif
endfor
endfor
end

Analysis of Parallel Reduction

•There is a dependency across levels of reduction. Unless reduction is ready at a lower level, a higher level cannot proceed with the operation.
• Overall time complexity of the algorithm is Θ (log n), given ën/2û processes
• The spawn routine requires élog ën/2û ù process doubling steps.
• The sequential for loop executes élog nù times.

O2. Explain Odd-Even Transposition Sort

It is designed for the processor array model in which the processing elements are organized into one-dimensional mesh.
Assume that A=(a0, a1,…,an-1) is the set of n elements to be sorted. Each of the n processing elements contains two local variables: a, unique element of array A, and t, a variable containing a value retrieved from a neighboring processing element. The algorithm performs n/2 iterations, and each iteration has two phase. In the first phase, called odd-even exchange, the value of a in every odd-numbered processor(except processor n-1) is compared with the value of a stored in the successor processor. The values are exchanged, if necessary, so that the lowered-numbered processor contain the smaller value. In the second phase, called even-odd exchange, the value of a in every even numbered processor is compared with the value of a in the successor processor. The values are exchanged, if necessary, so that lower numbered processor contain the smaller value. After n/2 iterations the value must be sorted.
Odd-Even Transposition Sort Algorithm for the one-dimensional mesh processor array model

Parameter n
Global i {Element to be sorted}
Local t {Element taken from adjacent processor}
Begin
for i ¬1 to n/2 do
for all pj, where 0 £ j £ n-1 do

if j < n-1 and odd(j) then
{Odd-even exchange}
t Ü successor(a) {Get value from successor}
successor(a)Ümax(a,t) {Give away larger value}
a¬ min(a,t) {Keep smaller value}
endif

if even(j) then
{Even-odd exchange}
t Ü successor(a) {Get value from successor}
successor(a) Ü max(a,t) {Give away larger value}
a ¬ min(a,t) {Keep smaller value}
endif
endfor
endfor
end

Example: Odd-Even Transposition Sort for eight values

Indices: 0 1 2 3 4 5 6 7
Initial Values: G H F D E C B A
After odd-even exchange: G F < H D < E B < C A
After even-odd exchange: F < G D < H B < E A < C
After odd-even exchange: F D < G B < H A < E C
After even-odd exchange: D < F B < G A < H C < E
After odd-even exchange: D B < F A < G C < H E
After even-odd exchange: B < D A < F C < G E < H
After odd-even exchange: B A < D C < F E < G H
After even-odd exchange: A < B C < D E < F G < H


Analysis of Odd-Even Transposition Sort

• The complexity of sorting n elements on a one- dimensional mesh processor array with n processors using odd-even transposition sort is Θ(n).

Q3. Explain Enumeration Sort
Assume that we are given table of n elements, denoted a0, a1,…., an-1, on which a linear order has been defined. Thus for any two elements ai and aj, exactly one of the following cases must be true: ai < ai =" aj,"> aj. The goal of sorting is to find a permutation (∏0, ∏1,…. ∏n-1) such that a∏0 £ a∏1 £ ……a ∏n-1.
An enumeration sort computes the final position of each element in the sorted list by comparing it with the other elements and counting the number of elements having smaller value. If j elements have smaller value than ai, then ∏j = i ; i.e., element ai
is the (j +1) element on the sorted list following a∏0,……, a∏j-1.

Algorithm for Enumeration Sort

ENUMERATION SORT (CRCW PRAM)
Parameter n {Number of elements}
Global a[0…(n-1)] {Elements to be sorted}
position[0…(n-1)] {Sorted positions}
sorted[0…(n-1)] {Contains sorted elements}
Begin
spawn(Pi,j, for all 0 £ i, j < n)
for all Pi,j, where 0 £ i, j< n do
position[i] ¬0
if a[i] < a[j] or (a[i] = a[j] and i < j) then
position[i] ¬ 1
endif
endfor
for all Pi,0, where 0 £ i < n do
sorted[position[i]] ¬ a[i]
endfor
end

Analysis of Enumeration Sort
•A set of n elements can be sorted in Q(log n) time with n2 processors, given a CRCW PRAM model in where simultaneous writes to the same memory location cuse the sum of the values to be assigned.
• If the time needed to spawn the processors is not counted, the algorithm executes in constant time.

Q4. Explain Parallel Quick Sort.
Quick sort is a divide and conquer algorithm that easily yields itself to parallelisation.
The array is partitioned into two parts using a pivot element, such that all elements in the left partition are smaller than the pivot element and the elements to the right are larger. The pivot element is inserted in between the two partitions and hence in the sorted position after partitioning. Apply the same algorithm recursively to each left and right parts of the array, if there is just one element in the array to stop.
To parallelise this algorithm, a set of processes(pool of workers) is created. The extents of the array to be sorted are placed on a stack to indicate the presence of work to be done. The first process to fetch the work partitions the array and places the extents of the resulting two partitions on the stack. The workers fetch the extents of a partition to be sorted from the stack and create new partitions. These are added to the stack.

Algorithm for Parallel Quick Sort

Global n {Size of array of unsorted elements}
a[0…(n-1)] {array of elements to be sorted}
sorted {Number of elements in sorted position}
min.partition {Smallest subarray that is partitioned rather than sorted directly}
Local bounds {Indices of unsorted subarray}
median {Final position in subarray of partitioning key}
Begin
sorted ¬ 0
INITIALIZE.STACK()

for all Pi, where 0 £ i < p do
while( sorted < n) do
bounds ¬ STACK.DELETE()
while(bounds.low < bounds.high) do
if(bounds.high - bounds.low < min.partition) then
INSERTION.SORT(a, bounds.low, bounds.high)
ADD.TO.SORTED(bounds.high-bounds.low +1 )
exit while
else
median¬ PARTITION(bounds.low, bounds.high)
STACK.INSERT(median + 1, bounds.high)
bounds.high ¬ median – 1

if bounds.low = bounds.high then
ADD.TO.SORTED(2)
else
ADD.TO.SORTED(1)
endif
endif
endwhile
endwhile
endfor
end

In the above algorithm, function INITIALIZE.STACK initializes the shared stack containing the indices of unsorted subarrays. When a process calls function STACK.DELETE, it receives the indices of an unsorted sub array if the stack contains indices; otherwise, there is no useful work to do at this point. Function STACK.INSERT adds the indices of an unsorted subarray to the stack. Since all these functions access the same shared data structure, their execution must be mutually exclusive. Function ADD.TO.SORTED increases the count of elements that are in their correct positions and execution of this function, too, must be mutually exclusive.

0 comments;Click here for request info on this topic:

Post a Comment