Broadcasting and Paralel Prefix operations
Broadcasting Operations A variety of functions have
been implemented on top of BSPlib (v1.4) and LAM-MPI (on top of MPI-2
primitives MPI_Put and MPI_Get). Named functions have been tested under both
libraries.
- Binary and d-ary tree replication of scalars and
arrays of arbitrary size.
- 2-phase algorithm arrays of arbitrary size. This algorithm is
preferable on large arrays since it only requires two supersteps (BSP
speaking) or two rounds of communication. In each round an n-relation
is realized (each processor sends/receives n elements where n is the
array size).
- Wrapper broadcasting functions that chooses the best ones
(2-phase vs d-ary tree and in the latter case choosed the value of d).
However at the moment the choices are too generic and unreliable as they
depend on the bsp parameters of an individual cluster and the default values
are obsolete.
Parallel Prefix Operations Functions for binary and d-ary
parallel prefix, including an implementation of a generalization
of the Ladner-Fisher algorithm (with binary and d-ary combining).
An implementation of a generalization of the butterfly algorithm
is also included.
- Binary(Ladner-Fisher oriented) and d-ary tree
(Ladner Fisher variant)
parallel prefix (scan) on arrays of arbitrary size (one/scalar or more).
The scan operator needs to be associative. The implementation is ugly
and requires for rounds of phases instead of the Ladner-Fisher 2-round
algorithm (up and down the tree). However the algorithm is quite memory
efficient. In the d-ary case each processor sends 1 word and receives at
most d words per superstep.
- 2-phase algorithm on arrays of arbitrary size. This algorithm is
preferable on large arrays since it only requires two supersteps (BSP
speaking) or two rounds of communication. In each round an n-relation
is realized (each processor sends/receives n elements where n is the
array size).
- Binary (Butterfly algorithm oriented) and d-ary tree (butterfly
algorithm oriented)
parallel prefix (scan) on arrays of arbitrary size (one/scalar or more).
The scan operator MUST BE COMMUTATIVE. The algorithm is based on the butterfly
algorithm (embedding of a binary tree on the butterfly) and is generalized
(in the d-tree case). The difference between this and the first algorithm
is a reduction in the phases and rounds of communication (half as many).
However each processor sends and receives d words per superstep.
- Faithful Ladner-Fisher binary and d-ary tree extension
parallel prefix (scan) on arrays of arbitrary size (one/scalar or more).
The scan operator needs to be associative.
Two phases up and down the tree, but to achieve this a penalty is paid in
memory efficiency terms. The algorithms seems to suffer because of this
from significant performance degradation. We plan to clean it up in the
future.
- Wrapper scan/prefix functions that chooses the best ones
(2-phase vs d-ary tree and in the latter case choosed the value of d).
However at the moment the choices are too generic and unreliable as they
depend on the bsp parameters of an individual cluster and the default values
are obsolete.
tars/bp03v2.tar
An earlier version is
tars/bp03v1.tar
Last Update: April 24, 2003