| o . sinnen [at] auckland . ac . nz |
Oliver's pages
List of publications without abstracts / with abstracts.
- N. Giacaman and O. Sinnen.
Parallel Iterator for parallelizing object-oriented applications.
International Journal of Parallel Programming, 39(2):232–269,
2011.
With the advent of multi-core processors, desktop application
developers mustfinally face parallel computing and its challenges. A large
portion of the computational load in a program rests within iterative
computations. In object-oriented languages these are commonly handled using
iterators which are inadequate for parallel programming. This paper presents
a powerful Parallel Iterator concept to be used in object-oriented programs
for the parallel traversal of a collection of elements. The Parallel Iterator
may be used with any collection type (even those inherently sequential) and
it supports several scheduling schemes which may even be decided dynamically
at run-time. Some additional features are provided to allow early termination
of parallel loops, exception handling and a solution for performing
reductions. With a slight contract modification, the Parallel Iterator
interface imitates that of the Java-style sequential iterator. All these
features combine together to promote minimal, if any, code restructuring.
Along with the ease of use, the results reveal negligible overhead and the
expected inherent speedup.
- N. Giacaman and O. Sinnen.
Object-oriented parallelisation of Java desktop programs.
IEEE Software, 28(1):32–38, January/February 2011.
- O. Sinnen, A. To, and M. Kaur.
Contention-aware scheduling with task duplication.
Journal of Parallel and Distributed Computing, 71(1):77–86,
2011.
- A. Z. Semar Shahul and O. Sinnen.
Scheduling task graphs optimally with A*.
The Journal of Supercomputing, 51(3):310–332, 2010.
Scheduling tasks onto the processors of a parallel system is a
crucial part of program parallelisation. Due to the NP-hard nature of the
task scheduling problem, scheduling algorithms are based on heuristics that
try to produce good rather than optimal schedules. Nevertheless, in certain
situations it is desirable to have optimal schedules, for example for
time-critical systems or to evaluate scheduling heuristics. This paper
investigates the task scheduling problem using the A* search algorithm which
is a best-first state space search. The adaptation of the A* search algorithm
for the task scheduling problem is referred to as the A* scheduling
algorithm. The A* scheduling algorithm can produce optimal schedules in
reasonable time for small to medium sized task graphs with several tens of
nodes. In comparison to a previous approach, the here presented A* scheduling
algorithm has a significantly reduced search space due to a much improved
consistent and admissible cost function f(s) and additional pruning
techniques. Experimental results show that the cost function and the various
pruning techniques are very effective for the workload. Last but not least,
the results show that the proposed A* scheduling algorithm significantly
outperforms the previous approach.
- O. Sinnen, L. Sousa, and F. E.
Sandnes.
Toward a realistic task scheduling model.
IEEE Transactions on Parallel and Distributed Systems,
17(3):263–275, 2006.
Task scheduling is an important aspect of parallel programming.
Most of the heuristics for this NP-hard problem are based on a very simple
system model of the target parallel system. Experiments revealed the
inappropriateness of this classic model to obtain accurate and efficient
schedules for real systems. In order to overcome this shortcoming, a new
scheduling model was proposed that considers the contention for communication
resources. Even though the accuracy and efficiency improved with the
consideration of contention, the new contention model is still not good
enough. The crucial aspect is the involvement of the processor in
communication. This paper investigates the involvement of the processor in
communication and its impact on task scheduling. A new system model is
proposed based on the contention model that is aware of the processor
involvement. The challenges for the scheduling techniques are analyzed and
two scheduling algorithms are proposed. Experiments on real parallel systems
show the significantly improved accuracy and efficiency of the new model and
algorithms.
- F. E. Sandnes and O. Sinnen.
A
new strategy for multiprocessor scheduling of cyclic task graphs.
International Journal of High Performance Computing and
Networking, 3(1):62–71, 2005.
A modular strategy for scheduling iterative computations is
proposed. Iterative computations are represented using cyclic task-graphs
that are transformed into acyclic task-graph. These acyclic task-graphs are
subsequently scheduled using one of the many well-known and high-quality
static scheduling strategies from the literature. Graph unfolding is not
employed and the generated schedules therefore require less memory than
schedules generated through graph unfolding. Further, the number of
iterations does not need to be known at compile-time. The paper
experimentally quantifies how the task transformation affects the make-span
of the schedules and the effectiveness of the approach is compared to other
methods including a graph unfolding strategy. A new more intuitive graph
unfolding formulation is also presented.
- O. Sinnen and L. Sousa.
Communication contention in task scheduling.
IEEE Transactions on Parallel and Distributed Systems,
16(6):503–515, June 2005.
Task scheduling is an essential aspect of parallel programming.
Most heuristics for this NP-hard problem are based on a simple system model
that assumes fully connected processors and concurrent interprocessor
communication. Hence, contention for communication resources is not
considered in task scheduling, yet it has a strong influence on the execution
time of a parallel program. This paper investigates the incorporation of
contention awareness into task scheduling. A new system model for task
scheduling is proposed, allowing us to capture both end-point and network
contention. To achieve this, the communication network is reflected by a
topology graph for the representation of arbitrary static and dynamic
networks. The contention awareness is accomplished by scheduling the
communications, represented by the edges in the task graph, onto the links of
the topology graph. Edge scheduling is theoretically analyzed, including
aspects like heterogeneity, routing, and causality. The proposed
contention-aware scheduling preserves the theoretical basis of task
scheduling. It is shown how classic list scheduling is easily extended to
this more accurate system model. Experimental results show the significantly
improved accuracy and efficiency of the produced schedules.
- O. Sinnen and L. Sousa.
On
task scheduling accuracy: Evaluation methodology and results.
The Journal of Supercomputing, 27(2):177–194, February 2004.
Many heuristics based on the Directed Acyclic Graph (DAG) have been
proposed for the static scheduling problem. Most of these algorithms apply a
simple model of the target system that assumes fully connected processors, a
dedicated communication sub-system and no contention for the communication
resources. Only a few algorithms consider the network topology and the
contention for the communication resources. This article evaluates the
accuracy of task scheduling algorithms and thus the appropriateness of the
applied models. An evaluation methodology is proposed and applied to a
representative set of scheduling algorithms. The obtained results show a
significant inaccuracy of the produced schedules. Analysing these results is
important for the development of more appropriate models and more accurate
scheduling algorithms.
- O. Sinnen and L. Sousa.
List
scheduling: Extension for contention awareness and evaluation of node
priorities for heterogeneous cluster architectures.
Parallel Computing, 30(1):81–101, January 2004.
In the area of static scheduling, list scheduling is one of the
most common heuristics for the temporal and spatial assignment of a Directed
Acyclic Graph (DAG) to a target system. As most scheduling heuristics, list
scheduling assumes fully connected homogeneous processors and ignores
contention on the communication links. This article extends the list
scheduling heuristic for contention aware scheduling on heterogeneous
arbitrary architectures. The extention is based on the idea of scheduling
edges to links, likewise the scheduling of nodes to processors. Based on this
extension, we compare eight priority schemes for the node order determination
in the first phase of list scheduling. Random graphs are generated and
scheduled with the different schemes to homogenous and heterogeneous parallel
systems from the area of cluster computing. Apart from identifying the best
priority scheme, the results give new insights in contention aware DAG
scheduling. Moreover, we demonstrate the appropriateness of our extended list
scheduling for homogeneous and heterogeneous cluster
architectures.
- O. Sinnen and L. Sousa.
Experimental evaluation of task scheduling accuracy: Implications for the
scheduling model.
IEICE Transactions on Information and Systems,
E86-D(9):1620–1627, September 2003.
Most heuristics for the task scheduling problem employ a simple
model of the target system, assuming fully connected processors, a dedicated
communication subsystem and no contention for communication resources. A
small number of algorithms is aware of the contention, using an undirected
graph model of the communication network. Although, many scheduling
algorithms have been compared in literature, little is known about the
accuracy and appropriateness of the employed models. This article evaluates
the accuracy of task scheduling algorithms on generic parallel systems. The
performed experiments show a significant inaccuracy of the schedules
produced. In an extensive analysis, the reasons for these results are
identified and the implications for the scheduling model are
discussed.
Notes:
Communication contention in task scheduling: For the proof of NP-completeness (Theorem 1) in the strong sense see the technical report below.)
- O. Sinnen.
Accurate Task Scheduling for Parallel Systems.
PhD thesis, Instituto Superior Técnico, Technical University of Lisbon,
Portugal, April 2003.
An essential aspect of program parallelisation is the scheduling of
tasks onto the processors of the parallel system. In static task scheduling,
the program is commonly modelled by a Directed Acyclic Graph (DAG) and the
target system as fully connected processors with contention-free
communication performed by a dedicated subsystem. Generally, scheduling is an
NP-hard problem, which has been motivating the development of many heuristics
for its near optimal solution.
This thesis investigates the concepts and
techniques of task scheduling by developing a consistent theoretical
framework. It goes beyond the classic approach to task scheduling by
investigating new parallel system models. The qualitative analysis and recent
experimental evaluations show the inadequacy of the idealised system model
for accurate and efficient task scheduling. To reflect real parallel systems,
the contention for communication resources and the involvement of the
processor in communication must be considered. Both aspects are extensively
investigated in this thesis for heterogeneous parallel systems: a theoretical
foundation is developed for each of the two aspects based on the
characteristics of real parallel systems and scheduling algorithms are
proposed for the new system models. Experimental evaluations on real parallel
systems demonstrate the significantly improved accuracy and execution time of
the produced schedules.
- O. Sinnen.
Experimental evaluation of task scheduling accuracy.
Tese de Mestrado (Master's thesis), Instituto Superior Técnico,
Technical University of Lisbon, Portugal, December 2001.
Many heuristics have been proposed for the scheduling of a program,
represented as a Directed Acyclic Graph (DAG), on a parallel system. Most of
these algorithms assume a simple target system model with fully connected
processors, a dedicated communication subsystem and no contention for
communication resources. Only recently, algorithms have been proposed that
consider the topology of the communication network and are aware of
contention. As far as we known, these scheduling models have not been
analysed regarding their appropriateness and accuracy.
This thesis
performs an experimental evaluation of task scheduling accuracy and
efficiency. The proposed methodology for the evaluation is based on the
techniques used in classical comparisons of scheduling algorithms and on the
automatic code generation for parallel systems. Applying this methodology,
the experiments were conducted on a variety of parallel systems, with
different types of architectures, for a representative set from list
scheduling, duplication scheduling and contention aware scheduling
algorithms. The obtained results show a significant inaccuracy of the
produced schedules for all algorithms. We explain these results and their
implications for the scheduling models as well as we analyse and compare the
efficiency of the selected scheduling algorithms.
- O. Sinnen.
Loop
splitting and loop scheduling methods on NUMA (non uniform memory access)
multiprocessors.
Diplomarbeit (Master's thesis), Chair for Operating Systems, RWTH
Aachen University, Germany, November 1997.
(In German).
Zusammenfassung
- O. Sinnen.
The
application of the Fast Fourier Transform to the watermarking of digital
images.
Studienarbeit (scientific work), Department of Electronic and Electrical
Engineering, Trinity College, Dublin University, December 1995.
Todays computers and printers are becoming more and more available
as a result of decreasing costs. At the same time computer networks, like the
Internet, are becoming widespread and an integral part of modern
communication. This makes it possible to provide electronic distribution and
publishing to a big commercial market. One of the major economic and
technical challenges is that of assuring ownership rights on distributed
images and video sequences. The Watermarking technique embeds an invisible
mark into an image that can only be decoded by comparison with the original.
This mark can identify both the source of the image and its legal recipient.
This report presents the problem and gives a review of current techniques for
grey scale and true color images. It then discusses the applicapability of
the Fast Fourier Transform for the Watermarking Algorithm. Finally open
problems and future work are addressed.
Notes:
An extensive Portuguese summary (31 pages) of the PhD thesis is given in the technical report below.
- R. Chandra and O. Sinnen.
Towards automated optimisation of tool-generated HW/SW SOPC designs.
In Proc. of the 19th ACM/SIGDA international symposium on Field
programmable gate arrays (FPGA '11), pages 285–285. ACM, 2011.
Abstract only, presented as poster.
Currently C-to-hardware (C2H) compilation tools have the potential
to generate high-performing and efficient hardware functions from application
source code. But often this is not realised without expensive manual code
modifications to massage the input source code into a form whereby the
compiler can extract maximum meaning from it, thus improving the quality of
generated hardware. These modifications represent a significant hurdle for
software developers; to improve this we present our work towards a
semi-automated compilation framework that attempts to streamline this design
flow. The proposed framework consists of an extensible analysis phase of
input source code and automated generation of code-mutations that serve as
trial candidates. These are then automatically combined within the greater
SoPC environment and passed through a parallel compilation stage through the
C2H tool. Based on the compilation results the process can be refined and
re-executed after manually-assisted candidate pruning. In experimental
results, a significant performance speedup has been demonstrated when
applying these techniques to simple examples which were compiled
out-of-the-box with the same C2H tool. Moreover, minimal extra development
effort was necessary to achieve these results. Open challenges still remain,
but we believe the promise of such an augmented approach to tool-generated
SoPC designs is clear.
- A. Benoit, L. Marchal, Y. Robert,
and O. Sinnen.
Mapping pipelined applications with replication to increase throughput and
reliability.
In Proc. of 22nd Int. Symposium on Computer Architecture and High
Performance Computing (SBAC-PAD 2010), pages 55–62, Petrópolis,
Brazil, October 2010. IEEE Press.
- R. Chandra and O. Sinnen.
Improving application performance with hardware data structures.
In Proc. of 17th Reconfigurable Architectures Workshop (RAW2010) (in
conjunction with IPDPS2010), Atlanta, USA, April 2010. IEEE Press.
Contemporary processors are becoming wider and more parallel. Thus
developers must work hard to extract performance gains. An alternative
computing paradigm is to use FPGA technology in a reconfigurable computing
environment–where both software and hardware can be specified. This has the
potential to realise substantial performance gains in a variety of
applications, however it is a daunting task as hardware development is
required to harness the benefits. In this research the acceleration of common
data structures–with the priority queue (PQ) as a case study–has been
explored in the context of such a reconfigurable computing environment. A
Java-based hybrid hardware/software PQ has been developed that is a `drop-in'
replacement for a software implementation; achieved by strictly adhering to
the same programming interface. The accelerated PQ has demonstrated up to 3x
speedup when performing a minimum spanning tree graph computation. Taking
this further, a suite of accelerated data structures represents an attractive
way for developers to harness the potential of reconfigurable computing in
the future across a wide gamut of application domains.
- N. Giacaman and O. Sinnen.
Parallel Task for parallelizing object-oriented desktop applications.
In Proc. of 11th IEEE Int. Workshop on Parallel and Distributed
Scientific and Engineering Computing (PDSEC-10) (in conjunction with
IPDPS2010), Atlanta, USA, April 2010. IEEE Press.
As multi-cores arrive for mainstream desktop systems, developers
must invest the effort to parallelize their applications. We present Parallel
Task (short ParaTask), a solution to assist the parallelization of
object-oriented applications, with the unique feature of including support
for the parallelization of graphical user interface (GUI) applications. In
the simple, but common, cases concurrency is introduced with a single
keyword. Due to the wide variety of parallelization needs, ParaTask
integrates different task types into the same model, provides intuitive
support for dependence handling, non-blocking notification, interim progress
notification and exception handling in an asynchronous environment as well
as supporting a pluggable task scheduling runtime (currently work-sharing,
work-stealing and a combination of the two are supported). The performance is
compared to traditional Java parallelization approaches using a variety of
different workloads.
- L. Akeila, O. Sinnen, and
W. Humadi.
Object oriented parallelisation of graph algorithms using Parallel Iterator.
In Proc. of 8th Australasian Symposium on Parallel and Distributed
Computing (AusPDC 2010), Brisbane, Australia, January 2010. Australian
Computer Society.
Abstract Multi-core machines are becoming widely used which, as a
consequence, forces parallel computing to move from research labs to being
adopted everywhere. Due to the fact that developing parallel code is a
significantly complex process, the main focus of today's research is to
design tools which facilitate the process of parallelising code. The Parallel
Iterator (PI) is a tool which was developed to automate the process of
parallelising loops in OO applications. Graph algorithms can be represented
using objects and hence they are excellent use cases for the PI. This paper
discusses using the PI to parallelising graph algorithms such as
breadth-first search (BFS), depth-first search (DFS) and minimum spanning
tree (MST). Using the PI to parallelise such graph algorithms required adding
some adaptations to the current concept of the PI to handle certain graph
algorithms. The PI facilitates the process of parallelising graph algorithms
in a way which keeps the parallel code readable and maintainable while
exhibiting speedup. Java was used as the implementation language since it is
one of the most commonly used object oriented languages. The parallelised
graph algorithms were tested on different graphs and trees with different
densities, granularity and structures. The experimental results show that the
parallelised graph algorithms exhibit good speedups.
- N. Giacaman and O. Sinnen.
Supporting partial ordering with the Parallel Iterator.
In Proc. of Int. Workshop on Parallel and Distributed Algorithms and
Applications (PDAA), in conjunction with PDCAT'09, Hiroshima, Japan,
December 2009. IEEE Press.
With the advent of multi-core processors, desktop application
developers must finally face parallel computing and its challenges. A large
portion of the computational load in a program rests within iterative
computations. In object-oriented languages these are commonly handled using
iterators which are inadequate for parallel programming. Consequently, the
powerful Parallel Iterator concept was developed. This paper presents various
developments of the Parallel Iterator, such as parallel traversal of complex
collections with partial ordering (such as a tree). Other features include
reductions, parallel remove semantics and exception handling. Along with the
ease of use, the results reveal great speedup in comparison to traditional
Java parallelism approaches.
- O. Sinnen, R. Long, and Q. H.
Tran.
Aiding parallel programming with on-the-fly dependence visualisation.
In Proc. of Int. Workshop on Parallel and Distributed Algorithms and
Applications (PDAA), in conjunction with PDCAT'09, Hiroshima, Japan,
December 2009. IEEE Press.
Parallel programming is notoriously difficult. This becomes even
more critical as multicore processors bring parallel computing into the
mainstream. In order to ease the difficulty, tools have been designed that
help the programmer with some aspects of parallelisation. Unfortunately, the
programmer is mostly left along when it comes to the difficult task of
dependence analysis among the subtasks to be executed concurrently. This
paper presents a new visual tool that supports the programmer with the
dependence analysis in loops. This is very useful in combination with an
automatically parallelising compiler or when loops are parallelised with
OpenMP. The tool displays on-the-fly the dependences between the statements
of the loop nest on which the developer is currently working. To maximise the
usefulness of the tool, it is unobtrusive, customisable and flexible, and
based on dependence analysis theory. A prototype was implemented for the
Eclipse IDE as a plug-in that seamlessly integrates into the normal
development process. The evaluation of the tool, including an evaluation
against cognitive dimensions, demonstrates the usability and usefulness of
the tool.
- O. Sinnen, A. To, and M. Kaur.
Contention-aware scheduling with task duplication.
In Job Scheduling Strategies for Parallel Processing, 14th Int. Workshop,
JSSPP 2009 (in conjunction with IPDPS 2009), Rome, Italy, volume 5798
of Lecture Notes in Computer Science, pages 157–168.
Springer-Verlag, 2009.
Scheduling a task graph onto several processors is a trade-off
between maximising concurrency and minimising interprocessor communication. A
technique to reduce or avoid interprocessor communication is task
duplication. Certain tasks are duplicated on several processors to produce
the data locally and avoid the communication among processors. Most
algorithms using task duplication are for the classic model, which allows
concurrent communication and ignores contention for communication resources.
The recently proposed, more realistic contention model introduces contention
awareness into task scheduling by assigning the edges of the task graph to
the links of the communication network. It is intuitive that scheduling under
such a model benefits even more from task duplication. This paper proposes a
contention-aware task duplication scheduling algorithm, after investigating
how to use task duplication in the contention model. An extensive
experimental evaluation demonstrates the significant improvements to the
speedup of the produced schedules.
- N. Giacaman, O. Sinnen, and
L. Akeila.
Object-oriented parallelisation: Improved and extended Parallel Iterator.
In Proc. of 14th IEEE Int. Conference on Parallel and Distributed Systems
(ICPADS'08), Melbourne, Australia, December 2008. IEEE Press.
The need to parallelise desktop applications is becoming
increasingly essential with the mainstream adoption of multi-cores. In
object-oriented languages, sequential iterators handle iterative computations
of a sequential program; similarly, the parallel iterator was developed to
handle the iterative computations of a parallel program. This paper presents
the progress of the parallel iterator concept. New features, such as support
for reductions and global break semantics, allow the parallel iterator to
undertake more situations. With a slight contract modification, the parallel
iterator interface now imitates that of the sequential iterator. All these
features combine together to promote minimal, if any, code restructuring. The
reduction frequently outperforms related work and the importance of providing
simple and flexible fine-tuning capability is affirmed.
- A. Z. Semar Shahul and O. Sinnen.
Optimal scheduling of task graphs on parallel systems.
In Proc. of 9th Int. Conf. on Parallel and Distributed Computing,
Applications and Technologies (PDCAT'08), Dunedin, New Zealand,
December 2008. IEEE Press.
Scheduling tasks onto the processors of a parallel system is a
crucial part of program parallelisation. Due to the NP-hard nature of the
task scheduling problem, scheduling algorithms are based on heuristics that
try to produce good rather than optimal schedules. Nevertheless, in certain
situations it is desirable to have optimal schedules, for example for time
critical systems or to evaluate scheduling heuristics. This paper
investigates the task scheduling problem using A* search algorithm. The A*
scheduling algorithm implemented can produce optimal schedules in reasonable
time for small to medium sized task graphs. In comparison to a previous
approach, the here presented A* scheduling algorithm has a significantly
reduced search space due to a much improved cost function f (s) and
additional pruning techniques. Last but not least, the experimental results
show that the proposed A* scheduling algorithm significantly outperforms the
previous approach.
- A. Palmer and O. Sinnen.
Scheduling algorithm based on force directed clustering.
In Proc. of 9th Int. Conf. on Parallel and Distributed Computing,
Applications and Technologies (PDCAT'08), Dunedin, New Zealand,
December 2008. IEEE Press.
This paper describes a new task scheduling algorithm based on
clustering. In this new approach, clustering of the tasks is achieved by
applying a force model to the task graph. From an initial configuration of
the task graph, forces act upon the nodes to manoeuvre them into a low energy
or equilibrium state. Clusters are created from the equilibrium state and
scheduled for an unlimited number of processors. This algorithm is compared
in an extensive experimental evaluation to three other clustering algorithms
namely, linear, single edge and dominant sequence clustering. By keeping the
mapping and scheduling phases of the algorithms identical, we compare only
the difference in clustering between all algorithms. Results show that force
directed clustering is very promising, especially for a limited number of
processors.
- N. Giacaman and O. Sinnen.
Parallel Iterator for parallelising object oriented applications.
In Proc. of 7th WSEAS Int. Conference on Software Engineering, Parallel
and Distributed Systems (SEPADS'08), Cambridge, UK, February 2008.
With the advent of multi-core processors, desktop application
developers must finally face parallel computing and its challenges. A large
portion of the computational load in a program rests within iterative
computations. In object-oriented languages these are commonly handled using
iterators which are inadequate for parallel programming. This paper presents
a powerful parallel iterator concept for object-oriented programmers to use
for the parallel traversal of a collection of elements. The parallel iterator
allows the structure of the program to remain unchanged, it may be used with
any collection type (even those inherently sequential) and it supports
several scheduling schemes which may even be decided dynamically at run-time.
Along with the ease of use, the results reveal negligible overhead and the
expected inherent speedup.
- N. Giacaman and O. Sinnen.
Task parallelism for object oriented programs.
In Proc. of Int. Symposium on Parallel Architectures, Algorithms, and
Networks (I-SPAN'08), Sydney, Australia, May 2008. IEEE Press.
Parallel computing is notoriously challenging due to the difficulty
in developing correct and efficient programs. With the arrival of multi-core
processors for desktop systems, desktop applications must now be
parallelised. However achieving task parallelism for such object-oriented
programs has traditionally been, and still remains, difficult. This paper
presents a powerful task concept for parallel object-oriented programming and
presents the results from a source-to-source compiler and runtime system.
With the addition of a single keyword, the sequential code does not require
restructuring and asynchronous task management is performed on behalf of the
programmer; the parallel code required to realise task parallelism looks very
much like the sequential counterpart. An intuitive solution is provided to
handle task dependencies as well as integrating different task concepts into
one model.
- O. Sinnen, J. Pe, and A. V.
Kozlov.
Support for fine grained dependent tasks in OpenMP.
In A Practical Programming Model for the Multi-Core Era/Proc. of 3rd Int.
Workshop on OpenMP (IWOMP'07), volume 4935 of Lecture Notes in
Computer Science, pages 13–24. Springer-Verlag, 2008.
OpenMP is widely used for shared memory parallel programming and is
especially useful for the parallelisation of loops. When it comes to task
parallelism, however, OpenMP is less powerful and the sections
construct lacks support for dependences and fine grained tasks. This paper
proposes a new work-sharing construct, tasks, which is a
generalisation of sections. It goes beyond sections by
allowing unbalanced and finer grained tasks with arbitrary dependence
structure. A proof-of-concept compiler has been implemented for the new
directives, which includes a state-of-the-art scheduling algorithm for task
graphs. Experiments with a large set of programs were conducted using the new
directives. The results demonstrate that the new approach can efficiently
exploit the task parallelism inherent in the code, without introducing any
additional overhead.
- O. Sinnen, A. V. Kozlov, and
A. Z. Semar Shahul.
Optimal scheduling of task graphs on parallel systems.
In Proc. Int. Conference on Parallel and Distributed Computing and
Networks, Innsbruck, Austria, February 2007. ACTA Press.
Scheduling tasks onto the processors of a parallel system is a
crucial part of program parallelisation. Due to the NP-hardness of the task
scheduling problem, scheduling al-gorithms are based on heuristics that try
to produce good rather than optimal schedules. Nevertheless, in certain
situations it is desirable to have optimal schedules, for example for time
critical systems or to evaluate scheduling heuristics. This paper proposes a
scheduling algorithm based on A* that can produce optimal schedules in
reasonable time for small task graphs. A* is a best-first state space search
algorithm. In comparison to a previous approach, the here presented
scheduling algorithm has a significantly reduced search space due to a much
improved cost function f(s) and additional pruning techniques. Experimental
results reveal the relation between the runtime of the algorithm and the
structure of the task graphs. Further, it is shown that the proposed
algorithm significantly outperforms the previous approach.
- F. E. Sandnes, O. Sinnen, and
Y.-P. Huang.
Gracefully degrading battery-aware static multiprocessor schedules based on
symmetric task fusion.
In Proc. of 6th Int. Conf. on Parallel and Distributed Computing,
Applications and Technologies (PDCAT'05), pages 1073–1077, Dalian,
China, December 2005. IEEE Press.
A novel strategy for employing schedules obtained using standard
static scheduling algorithms in a battery powered multiprocessor environment
is investigated. The strategy is able to dynamically respond to deteriorating
batteries and operate with fewer processors according to the battery levels
of the system. The nature of the proposed approach allows poorly performing
batteries to recover through self charge. The strategy therefore maximizes
the battery life and the operation time of the device.
- F. E. Sandnes and O. Sinnen.
Stochastic DFS for multiprocessor scheduling of cyclic taskgraphs.
In Proc. of 5th Int. Conf. on Parallel and Distributed Computing,
Applications and Technologies (PDCAT'04), volume 3320 of
Lecture Notes in Computer Science, pages 342–350.
Springer-Verlag, 2004.
DFS has previously been shown to be a simple and efficient strategy
for removing cycles in graphs allowing the resulting DAGs to be scheduled
using one of the many well-established DAG multiprocessor scheduling
algorithms. In this paper, we investigate the inefficiencies of schedules
acquired using DFS cycle removal. Further, an improved randomised DFS cycle
removal algorithm is proposed that produces significantly improved results
with acceptable computational overheads.
- O. Sinnen and L. Sousa.
Task
scheduling: Considering the processor involvement in communication.
In Proc. 3rd Int. Workshop on Algorithms, Models and Tools for Parallel
Computing on Heterogeneous Networks (ISPDC'04/HetroPar'04), pages
328–335, Cork, Ireland, July 2004. IEEE Press.
Classical task scheduling employs a very simplified model of the
target parallel system. Experiments demonstrated that this leads to
inaccurate and inefficient schedules. Contention aware scheduling heuristics
take the contention for communication resources into account, which improves
the schedules significantly. Yet, one aspect remains to be investigated: the
involvement of the processors in communication. This paper proposes a new
scheduling model, called involvement-contention model, that integrates the
consideration for the processor involvement into task scheduling. A list
scheduling based heuristic is proposed for the new model, which produces
significantly more accurate and efficient schedules in experiments on real
parallel systems.
- F. E. Sandnes and O. Sinnen.
A
graph transformational approach to the multiprocessor scheduling of iterative
computations.
In Proc. of 4th Int. Conf. on Parallel and Distributed Computing,
Applications and Technologies (PDCAT'03), pages 571–576, Chengdu,
Sechuan, China, August 2003. IEEE Press.
A modular strategy for scheduling iterative computations is
proposed. An iterative computation is represented using a cyclic task-graph.
The cyclic task-graph is transformed into an acyclic task-graph. This acyclic
task-graph is subsequently scheduled using one of the many well-known and
high-quality static scheduling strategies from the literature. Graph
unfolding is not employed and the generated schedules therefore require less
memory than schedules generated through graph unfolding. Further, the number
of iterations does not need to be known at compile-time. The effectiveness of
the approach is compared to other methods including a graph unfolding
strategy. In addition, the paper experimentally quantifies how the task
transformation affects the makespan of the schedules.
- O. Sinnen and L. Sousa.
Experimental evaluation of task scheduling accuracy.
In Proc. of 3rd Int. Conf. on Parallel and Distributed Computing,
Applications and Technologies (PDCAT'02), pages 378–383, Kanazawa,
Japan, September 2002.
Most heuristics for the task scheduling problem employ a simple
model of the target system, assuming fully connected processors, a dedicated
communication subsystem and no contention for communication resources. A
small number of algorithms is aware of the contention, using an undirected
graph model of the communication network. Although, many scheduling
algorithms have been compared in literature, little is known about the
accuracy and appropriateness of the employed models. This article evaluates
the accuracy of task scheduling algorithms on generic parallel systems. The
performed experiments show a significant inaccuracy of the schedules
produced. Analysing these results is important for the development of more
appropriate models and more accurate scheduling algorithms.
- M. Ribeiro, O. Sinnen, and
L. Sousa.
MPEG-4 natural video parallel implementation on a cluster.
In Proc. of 12th Portuguese Conference on Pattern Recognition (RecPad
2002), Aveiro, Portugal, June 2002.
This paper describes a parallel encoder for the natural video
section of the MPEG-4 specification, based on the MomuSys Verification Model.
Several different scheduling approaches are considered in order to balance
the computational load. The system is built upon a cluster of computers
running Linux connected in a local network. The communication relies on the
MPI specification. The results show that it is possible to speed up
considerably the encoding process using the proposed system.
- O. Sinnen and L. Sousa.
Comparison of contention aware list scheduling heuristics for cluster
computing.
In Proc. of Workshop on Scheduling and Resource Management for Cluster
Computing (ICPP 2001), pages 382–387, Valencia, Spain, September
2001. IEEE Press.
In the area of static scheduling, list scheduling is one of the
most common heuristics for the temporal and spatial assignment of a Directed
Acyclic Graph (DAG) to a target machine. As most heuristics, list scheduling
assumes fully connected homogeneous processors and ignores contention on the
inter communication links.
This article extends the list scheduling
heuristic for contention aware scheduling on heterogenous arbitrary machines.
The extention is based on the idea of scheduling edges to links, likewise the
scheduling of nodes to processors. Based on this extension, we compare eight
priority schemes for the node order determination of the first phase of list
scheduling. Random graphs are generated and scheduled with the different
schemes to homogenous and heterogenous parallel systems from the area of
cluster computing. The experiments demonstrate the appropriateness of our
extended list scheduling for homogeneous and heterogenous cluster
architectures.
- O. Sinnen and L. Sousa.
Exploiting unused time slots in list-scheduling considering communication
contention.
In Euro-Par 2001 Parallel Processing, volume 2150 of
Lecture Notes in Computer Science, pages 166–170.
Springer-Verlag, 2001.
Static scheduling is the temporal and spatial mapping of a program
to the resources of a parallel system. Scheduling algorithms use the Directed
Acyclic Graph (DAG) to represent the sub-tasks and the precedence-constraints
of the program to be parallelised. This article proposes an extention to the
classical list scheduling heuristic that allows to schedule DAGs to arbitrary
processor architectures considering link contention. In this extension,
communication links are treated as shared resources likewise the processors
and we improve the extended algorithm by exploiting unused time slots on the
resources. The algorithm is experimentally compared to the existing
heuristics BSA and DLS.
- O. Sinnen and L. Sousa.
Scheduling task graphs on arbitrary processor architectures considering
contention.
In High Performance Computing and Networking, volume 2110 of
Lecture Notes in Computer Science, pages 373–382.
Springer-Verlag, 2001.
For the efficient utilisation of a parallel system a task must be
divided into sub-task and these must be assigned to the system's resources.
The temporal and spatial assignment of sub-task to the computing and
communication resources of a parallel system is known as the scheduling
problem.
This article proposes a new scheduling heuristic, called Simple
Contention Scheduling (SCS), for arbitrary processor architectures with the
consideration of link contention. The properties of the algorithm are
discussed and it is compared to other existing algorithms (DLS and BSA) for
arbitrary processor architectures with the consideration of contention. The
comparison is done for a large set of random graphs and different
architectures. Experimental results show that, while having the same or lower
complexity, SCS produces in most cases better results than the other
algorithms.
- L. Sousa and O. Sinnen.
Synchronous non-local image processing on orthogonal multiprocessor systems.
In Vector and Parallel Processing – VECPAR 2000, selected papers,
volume 1981 of Lecture Notes in Computer Science, pages
453–466. Springer-Verlag, 2001.
A method for mapping nonlocal image processing algorithms onto
Orthogonal Multiprocessor Systems (OMP) is presented in this paper.
Information is moved between memory modules by alternating the processors
mode of accessing the memory array. The introduction of synchronisation
barriers when the processors attempt to change the memory access mode of the
OMP architecture synchronises the processing. Two parallel and nonlocal
algorithms of the low and intermediate image processing levels are proposed
and their efficiency is analysed. The performance of the OMP system for these
type of algorithms was evaluated by simulation.
- O. Sinnen and L. Sousa.
A
platform independent parallelising tool based on graph theoretic models.
In Vector and Parallel Processing – VECPAR 2000, selected papers,
volume 1981 of Lecture Notes in Computer Science, pages
154–167. Springer-Verlag, 2001.
Parallel programming demands, in contrast to sequential
programming, sub-task identification, dependence analysis and
task-to-processor assignment. This paper presents a new parallelising tool
that supports the programmer in these early challenging phases of parallel
programming. In contrast to existing parallelising tools, the proposed
parallelising tool is based on a new graph theoretic model, called annotated
hierarchical graph, that integrates the commonly used graph theoretic models
for parallel computing. Part of the parallelising tool is an object oriented
framework for the development of scheduling and mapping algorithms, which
allows to rapidly adapt and implement new algorithms. The tool achieves
platform independence by relying on internal structures that are not bound to
any architecture and by implementing the tool in Java.
- R. Moura, N. Vargas, R. Machado,
O. Sinnen, and N. Horta.
SYMAN V1.0: symbolic analysis tool for both classroom and analog design
automation environments.
In Proc. of Int. Workshop on Symbolic Methods and Applications in Circuit
Design (SMACD 2000), pages 29–32, Lisboa, Portugal, October 2000.
In this paper, a fully modular and multi-platform tool for symbolic
analysis, SYMAN, being developed in a classroom environment is described. The
modular structure and development allows an easy and robust way to both
verify some well known symbolic analysis methods and extend the tool to new
methods or even explore new optimization procedures. Finally, the integration
within other design automation tools, e.g. circuit and system optimization
modules, is also discussed.
- O. Sinnen, J. Leal, and
J. Pereira.
SharedFantasy: a shared distributed multi-user technology using Java's
RMI.
In Proc. of 9th Portuguese Meeting on Computer Graphics, pages
61–67, Marinha Grande, Portugal, February 2000.
A Multi-User Technology (MUTech) permits multiple users to navigate
and interact in a shared virtual environment. In this paper a WWW based
MUTech, called SharedFantasy, is presented based on the open technologies
VRML, HTML and Java. For the exchange of shared and distributed information,
the Java integrated technology RMI (Remote Method Invocation) is proposed.
After the detailed description of the SharedFantasy functionality, the
principles and technology of this new MUTech are discussed in detail. Special
attention is given to the Living Worlds oriented VRML structure and to the
RMI based communication technology.
- O. Sinnen and L. Sousa.
A
comparative analysis of graph models to develop parallelising tools.
In Proc. of 8th IASTED Int. Conference on Applied Informatics (AI
2000), pages 832–838, Innsbruck, Austria, February 2000.
Parallel programming demands, in contrast to sequential
programming, sub-task decomposition, dependence analysis, and mapping and
scheduling. Tools that support the programmer with these challenging duties,
i.e. parallelising tools, have in common the need for an efficient internal
representation of the tasks to be parallelised. The widely used graph
theoretic models clearly exhibit the available concurrency of algorithms and
benefit from the rich mathematical foundation. However, the proposed graph
models vary in their appropriateness for a parallelising tool, with respect
to parallel computations that can be modelled, supported parallel
architectures and available algorithms. This paper presents a comparative
analysis of graph theoretic models based on a proposed classification scheme.
As it will be shown, to make a parallelising tool universal, it has to base
its task representation on more that one model. A modular and system
independent parallelising tool adopting a multi-model approach is also
proposed in this paper.
- J. J. K. Ó Ruanaidh, F. M.
Boland, and O. Sinnen.
Watermarking digital images for copyright protection.
In Proc. of Electronic Imaging and the Visual Arts Conference,
pages 243–246, Florence, Italy, February 1996.
A watermark is an invisible mark placed on an image that later can
be detected and used as evidence of copyright infringement. This mark is
designed to identify both the source of a document as well as its intended
recipient. This paper describes novel techniques for embedding such marks in
grey scale and colour digital images. The paper extends existing transform
domain techniques by using local phase to embed a watermark in an
image.
- R. Chandra and O. Sinnen.
Improving application performance with hardware data structures.
Technical Report 678, University of Auckland, New Zealand, March 2010.
Processors are not getting faster but wider and more parallel, thus
developers must work even harder to extract performance gains. An alternative
computing paradigm is to use FPGA technology to create a reconfigurable
computing environment–where both software and hardware are variable and can
be re-configured. This approach has the potential to realise substantial
performance gains in a wide gamut of applications, however it is also a
daunting task as applications require hardware development to harness the
benefits. In this research the acceleration of common data structures, using
the priority queue as a case study, has been explored in the context of a
reconfigurable computing environment. A Java-based hybrid hardware/software
priority queue (PQ) has been implemented that is a drop-in replacement for a
software PQ as it adheres strictly to the same programming interface as
conventional software implementations. The accelerated PQ has demonstrated up
to 3x speedup when running a minimum spanning tree graph computation. A suite
of accelerated data structures represents an attractive way for developers to
harness the potential of reconfigurable computing in the future across a
wide gamut of application domains.
- N. Giacaman and O. Sinnen.
Parallel Task for parallelising object-oriented desktop applications.
Technical Report 675, University of Auckland, New Zealand, November 2009.
With the arrival of multi-cores for mainstream desktop systems,
developers must invest the effort of parallelising their applications in
order to benefit from these systems. However, the structure of these
interactive desktop applications is noticeably different from the traditional
batch-like applications of the engineering and scientific fields. We present
Parallel Task (short ParaTask), a solution to assist the parallelisation of
object-oriented applications, with the unique feature of including support
for the parallelisation of graphical user interface (GUI) applications. In
the simple, but common, cases concurrency is introduced with a single
keyword. ParaTask sets itself apart from the many existing object-oriented
parallelisation approaches by integrating different task types into the same
model and its careful adherence to object-oriented principles. Due to the
wide variety of parallelisation needs, ParaTask provides intuitive support
for dependence handling, non-blocking notification and exception handling in
an asynchronous environment as well as supporting a flexible task scheduling
runtime (currently work-sharing, work-stealing and a combination of the two
are supported). The performance is compared to traditional Java
parallelisation approaches using a variety of different
workloads.
- A. Benoit, L. Marchal, Y. Robert,
and O. Sinnen.
Mapping pipelined applications with replication to increase throughput and
reliability.
Research Report RR-LIP-2009-28, LIP, ENS Lyon, France, September 2009.
Available at http://graal.ens-lyon.fr/ abenoit.
Mapping and scheduling an application onto the processors of a
parallel system is a difficult problem. This is true when performance is the
only objective, but becomes worse when a second optimization criterion like
reliability is involved. In this paper we investigate the problem of mapping
an application consisting of several consecutive stages, i.e., a pipeline,
onto heterogeneous processors, while considering both the performance,
measured as throughput, and the reliability. The mechanism of replication,
which refers to the mapping of an application stage onto more than one
processor, can be used to increase throughput but also to increase
reliability. Finding the right replication trade-off plays a pivotal role for
this bi-criteria optimization problem. Our formal model includes
heterogeneous processors, both in terms of execution speed as well as in
terms of reliability. We study the complexity of the various subproblems and
show how a solution can be obtained for the polynomial cases. For the general
NP-hard problem, heuristics are presented and experimentally evaluated. We
further propose the design of an exact algorithm based on A* state space
search which allows us to evaluate the performance of our heuristics for
small problem instances.
- N. Giacaman and O. Sinnen.
Parallel iterator for parallelising object-oriented applications.
Technical Report 669, University of Auckland, New Zealand, November 2008.
With the advent of multi-core processors, desktop application
developers must finally face parallel computing and its challenges. A large
portion of the computational load in a program rests within iterative
computations. In object-oriented languages these are commonly handled using
iterators which are inadequate for parallel programming. This paper presents
a powerful Parallel Iterator concept to be used in object-oriented programs
for the parallel traversal of a collection of elements. The Parallel Iterator
may be used with any collection type (even those inherently sequential) and
it supports several scheduling schemes which may even be decided dynamically
at run-time. Some additional features are provided to allow early termination
of parallel loops, exception handling and a solution for performing
reductions. With a slight contract modification, the Parallel Iterator
interface imitates that of the Java-style sequential iterator. All these
features combine together to promote minimal, if any, code restructuring.
Along with the ease of use, the results reveal negligible overhead and the
expected inherent speedup.
- N. Giacaman and O. Sinnen.
Inhibitors for desktop parallelisation.
Technical Report 655, University of Auckland, New Zealand, July 2006.
Parallel computing is notoriously challenging, making it difficult
to develop efficient and correct programs. With the arrival of multicore
processors, desktop environments must be parallelised if they are to benefit
from these new processors. However, the parallelisation of desktop
environments entails even more challenges than that in a typical parallel
program. This report outlines such challenges, suggesting possible areas and
solutions to investigate.
- P.-F. Dutot, O. Sinnen, and
L. Sousa.
A
note on the complexity of task scheduling with communication contention.
Technical report, University of Auckland, New Zealand, February 2005.
Considering the contention for communication resources in task
scheduling is discussed in [3]. Theorem 1 of [3] states the NP-completeness
of the associated decision problem and the corresponding proof shows
NP-completeness in the weak sense [2]. Here, a new proof is presented for
Theorem 1 of [3] that shows NP-completeness in the strong sense. The proof is
based on a reduction from the well-known 3-PARTITION [2] problem, which is
NP-complete in the strong sense. This document is meant to be read in
conjunction with [3].
[2] M. R. Garey and D. S. Johnson. Computers
and Intractability: A Guide to the Theory of NP-Completeness. Freeman, 1979.
[3] O. Sinnen and L. Sousa. Communication contention in task scheduling.
IEEE Transactions on Parallel and Distributed Systems, 16(6):503-515, June
2005.
- O. Sinnen.
Agendamento Preciso de Tarefas para Sistemas Paralelos (Accurate Task
Scheduling for Parallel Systems).
Technical report, INESC-ID, Instituto Superior Técnico, Technical
University of Lisbon, Portugal, December 2003.
(In Portuguese).
Um aspecto essencial da paralelização de programas é o
agendamento de tarefas. No agendamento estático de tarefas, o programa
é geralmente modelizado por um grafo acíclico dirigido e o sistema
alvo como um conjunto de processadores completamente ligados que comunicam
sem contenção através de um subsistema dedicado. Na forma geral,
o agendamento é um problema NP-difícil que tem motivado o
desenvolvimento de heurísticas para soluções quase óptimas.
Uma análise qualitativa e avaliações experimentais recentes
mostram a desadequação do modelo de sistema adoptado por estas
heurísticas, no que diz respeito à precisão e à
eficiência dos agendamentos. A contenção dos recursos de
comunicação e o envolvimento do processador na comunicação
têm de ser considerados para a modelização dos sistemas paralelos
reais. Ambos estes aspectos são investigados intensivamente neste
trabalho para sistemas paralelos heterogéneos, desenvolvendo-se a base
teórica que permite considerá-los e propondo-se algoritmos de
agendamento para os novos modelos. A avaliação experimental dos novos
modelos e algoritmos em sistemas paralelos reais mostra uma significativa
melhoria da precisão e do tempo de execução dos
agendamentos.
- O. Sinnen and L. Sousa.
A
classification of graph theoretic models for parallel computing.
Technical Report RT/005/99, INESC-ID, Instituto Superior Técnico, Technical
University of Lisbon, Portugal, May 1999.
Graph theoretic models are widely used as representations of
parallel computations. Graphs can be an initial or intermediate (e.g. in case
of parallel compilers) representation of a computation. While being visual
and intuitive this representation also benefits from the mathematical graph
theory, that offers methods for graph analysis and manipulation.
In the
last decades many graph theoretic models have been proposed to model parallel
computations. It now arises the question which model is appropriate for a
certain purpose. Several issues are involved in that question: the
applicability and efficiency of the model to a given purpose; the analyses
and methods available for that model; the flexibility and the practicability
of a model and associated methods.
This paper proposes a classification
scheme for graph theoretic models of parallel computing that addresses the
referred issues. It provides an overview of common graph theoretic models and
classifies them according to the proposed scheme. In addition, strengths and
limitations of the models as well as the connection between the models are
discussed, providing some new insights in the overall problem.
Notes:
Agendamento Preciso de Tarefas para Sistemas Paralelos é um resumo Português extensivo (31 páginas) da minha tese de doutoramento.
- O. Sinnen and N. Giacaman.
Object-oriented parallelisation of java desktop programs.
Departmental seminar, Department of Electrical and Computer Engineering,
University of Auckland, New Zealand, September 2010.
Developing parallel applications is notoriously difficult, but is
even more complex for desktop applications. The added difficulties are
primarily because of their interactive nature, where performance is largely
perceived by its users. Desktop applications are typically developed with
graphical toolkits that in turn have limitations in regards to
multi-threading. This presentation explores the structure of desktop
applications, the limitations of the threading model and walks the audience
through the parallelisation of a desktop application using object-oriented
and GUI-aware concepts called Parallel Task and Parallel Iterator. Dr Sinnen
starts the seminar with a brief overview of the research activities in the
Parallel and Reconfigurable Computing research group. The activities range
from fundamental problems to practical approaches to parallel programming,
which is the focus of the rest of the seminar given by Mr
Giacaman.
- O. Sinnen.
Object oriented parallelisation of Java desktop programs.
Invited research seminar, Chair of Operating Systems, RWTH-Aachen University,
Aachen, Germany, July 2010.
(In German).
Parallel programming is notoriously difficult and has motivated
research in better languages, libraries and tools. Even more complex is the
parallelisation of desktop applications. This talk presents novel concepts
and tools for the parallelisation of object-oriented programs with a focus on
desktop applications with a graphical user interface (GUI). It starts by
looking at the rarely discussed challenges that are unique to GUI desktop
applications, namely the user interactivity, the limitations of the graphical
frameworks and the large variety of target systems. As most desktop
applications are written in object-oriented languages, the parallelisation
must be performed in these languages. In contrast to previous work, the focus
is to have real object-oriented solutions, including aspects like inheritance
and concurrent exception handling. The presented parallelisation approach,
called Parallel Task and Parallel Iterator (www.parallellit.org), is based on
a unified task concept that integrates all common concurrency types.
This is a hands-on talk, walking you through the famous "Hello World"
parallel program using these concepts. If you bring your laptop with an
installedEclipse IDE and Internet access, you can follow the examples. You
only need to install the Parallel Task Eclipse Plugin available at
www.parallellit.org.
- O. Sinnen.
Making contention scheduling aware of identical data.
Research seminar, 3rd Scheduling in Aussois Workshop, Aussois, French Alps,
France, June 2010.
Invitees only.
It has been widely recognised that task scheduling on parallel
systems cannot be accurate and efficient without a realistic view of the
communication costs. Considering the limited amount of communication
resources and their bandwidth is crucial. This has been addressed by the
contention scheduling model where the communications are scheduled on the
network resources, for example the ports in the one-port model. Such
contention-aware scheduling is an extension of the classical task scheduling
model based on communication delays. Problematic is, however, that identical
data items are not distinguishable in the task graph. Under the classic model
that was not an issue, but it is with contention scheduling. This talk will
investigate the problem and presents some preliminary results.
- O. Sinnen.
Considering the real cost of communication in task scheduling.
Research seminar, Workshop on Algorithms and Techniques for Scheduling on
Clusters and Grids (ASTEC 2009), Les Plantiers, France, June 2009.
Invitees only.
- O. Sinnen.
Multicores, Playstation 3, reconfigurable hardware – parallel computing
research in Auckland.
Invited research seminar, INESC-ID, Instituto Superior Técnico, Technical
University of Lisbon, Portugal, December 2008.
(In Portuguese).
- O. Sinnen.
Paralleles Rechnen: Multicores, Playstation 3, Rekonfigurierbare
Hardware.
Invited research seminar, Chair of Operating Systems, RWTH-Aachen University,
Aachen, Germany, November 2008.
(In German).
- O. Sinnen.
Optimal task scheduling.
Invited research seminar, ID-Institut IMAG, Grenoble, France, October 2008.
- O. Sinnen.
Optimal task scheduling.
Invited research seminar, École Normale Supérieure (ENS) Lyon, France,
September 2008.
Scheduling a task graph (a DAG with computation and communication
costs) onto the processors of a parallel system is an NP-hard problem. As a
consequence, scheduling algorithms are based on heuristics that try to
produce good rather than optimal schedules. Nevertheless, in certain
situations it is desirable to have optimal schedules, for example for time
critical systems, for loop kernels or to evaluate scheduling heuristics. In
this talk I will present scheduling algorithm based on A* that can produce
optimal schedules in reasonable time for small task graphs. A* is a
best-first state space search algorithm used in the area of artificial
intelligence. In comparison to a previous approach, the presented scheduling
algorithm has a significantly reduced search space due to a much improved
cost function f (s) and additional pruning techniques. Experimental results
will be presented that show the significantly improved performance. The
experiments also reveal the sometimes surprising relation between the runtime
of the algorithm and the structure of the task graphs.
- O. Sinnen.
Parallel computing research: Multicores, Playstation 3, reconfigurable
hardware.
Invited research seminar, École Normale Supérieure (ENS) Lyon, France,
September 2008.
- O. Sinnen.
Parallel computing research: Multicores, Playstation 3, reconfigurable
hardware.
Invited research seminar, Computer Science, Faculty of Engineering, Oslo
University College, Oslo, Norway, August 2008.
- O. Sinnen.
Parallel systems.
Invited lecture series, Computer Science, Faculty of Engineering, Oslo
University College, Oslo, Norway, August 2008.
- O. Sinnen.
Parallelisation of object-oriented programs.
Invited research seminar, Nokia/Trolltech, Olso, Norway, August 2008.
- O. Sinnen.
Parallel computing: Multicores, Playstation 3, reconfigurable hardware.
Departmental seminar, Department of Electrical and Computer Engineering,
University of Auckland, New Zealand, June 2008.
Parallel computing has been around for decades. Until recently,
however, it was mostly confined to parallel computing centres and ignored by
the mainstream. With the introduction of multi-core processors most new
computers (and even gaming consoles, e.g. Playstation 3) are now parallel
systems, making the discipline of parallel computing more important than
ever. In contrast to the past, when new processor generations brought
automatic performance improvements, programs must now execute in parallel if
they are to benefit from new processor generations. Unfortunately,
parallelising a program, i.e. writing a program in such a way that it uses
the multiple processors concurrently, is far more challenging than writing a
sequential program that only utilises one processor. There are unsolved
fundamental and practical problems in the parallelisation of programs.
This seminar will present the exciting research in the Parallel and
Reconfigurable Computing Lab of the ECE department that addresses these
problems. The research ranges from fundamental problems to the exploitation
of new forms of parallelism. This includes algorithms and models for the
fundamental problem of task scheduling, Eclipse based visual tools for the
parallelisation process, high performance computing with our Playstation 3
cluster, parallel computing in Object Oriented languages and the utilisation
of reconfigurable hardware with high level languages like
Java.
- O. Sinnen.
Task
scheduling for parallel systems.
Research seminar, IEEE New Zealand Workshop in High Performance and Grid
Computing, Auckland University of Technology, New Zealand, November 2006.
- O. Sinnen.
Task scheduling for parallel systems.
Research seminar, Embedded Systems Group, Department of Electrical and Computer
Engineering, University of Auckland, New Zealand, May 2005.
- O. Sinnen.
Genaues Task Scheduling für Parallelsysteme (Accurate Task
Scheduling for Parallel Systems).
Invited seminar, Chair for Technical Informatics, Würzburg University,
Würzburg, Germany, May 2004.
(In German).
- O. Sinnen.
Accurate task scheduling for parallel systems.
Invited seminar, Department of Electrical and Computer Engineering, University
of Auckland, Auckland, New Zealand, April 2004.
- O. Sinnen.
Advanced computer architectures: Agendamento para sistemas paralelos.
Invited lecture, Instituto Superior Técnico, Technical University of
Lisbon, Portugal, May 2004.
(In Portuguese).
- O. Sinnen.
Static task scheduling.
Invited seminar, Institute for Computing Systems Architecture, School of
Informatics, University of Edinburgh, Edinburgh, Scotland, July 2002.
- O. Sinnen.
Sistemas de processamento paralelo: Programação e scheduling.
Research seminar, INESC-ID, Instituto Superior Técnico, Technical
University of Lisbon, Portugal, May 2001.
(In Portuguese).
Notes:
News story about the talk at the Oslo University College (automatically tranlated to English)