SC2000 TUTORIAL PROGRAM

TUTORIALS

This year's tutorials include exciting offerings in new topics, such as mesh generation, XML, parallel programming for clusters, numerical computing in Java, and management of large scientific datasets, along with the return of some of the most requested presenters from prior years, with new and updated materials. In addition, we offer some of the full day tutorials as two half-day tutorials (denoted by the numbering with A and B), thereby increasing the number of half-day tutorials. We have a total of nine full-day and 15 half-day tutorials covering 20 topics. Attendees also have the opportunity for an international perspective on topics through the tutorials on large data visualization, cluster computing, performance analysis tools and numerical linear algebra. Separate registration is required for tutorials; tutorial notes and luncheons will be provided on site (additional tutorial notes will be sold on site). A One- or Two-day Tutorial Passport allows attendees the flexibility to attend multiple tutorials.

See the Tutorials at a Glance table below for a list of this year's tutorials. Detailed information about all tutorials can be found by scrolling further down this page.

Questions: tutorials@sc2000.org


  TUTORIALS CO-CHAIRS
VALERIE TAYLOR, NORTHWESTERN UNIVERSITY
MICHELLE HRIBAR, PACIFIC UNIVERSITY


SC2000 TUTORIALS AT A GLANCE

The following tutorials have been accepted for presentation at SC2000. Author and abstract information can be obtained by clicking on the tutorial number (S1, M1, etc.) or by scrolling further down this page.

SUNDAY FULL DAY MONDAY FULL DAY
S1 Using MPI-2: A Tutorial on Advanced Features of the Message-Passing Interface M1 Performance Analysis and Prediction for Large-Scale Scientific Applications
S2 An Introduction to High Performance Data Mining M2 Parallel I/O for Application Developers
S3 Design and Analysis of High Performance Clusters M3 Framework Technologies & Methods for Large Data Visualization
S4 High Performance Numerical Computing in Java: Compiler, Language, and Application Solutions M4 Computational Biology and High Performance Computing
S5 Performance Analysis and Tuning of Parallel Programs: Resources and Tools  
SUNDAY HALF DAY - AM MONDAY HALF DAY - AM
S6A Mesh Generation for High Performance Computing
Part I: An Overview of Unstructured and Structured Grid Generation Techniques
M5A Application Building with XML: Standards, Tools, and Demos-Part I
S7A Introduction to Effective Parallel Computing, Part I M6A Parallel Programming with OpenMP: Part I, Introduction
S8 Tools and System Support for Managing and Manipulating Large Scientific Datasets M7 High-Speed Numerical Linear Algebra: Algorithms and Research Directions
  M8 Parallel Programming for Cluster Computers
SUNDAY HALF DAY - PM MONDAY HALF DAY - PM
S6B Mesh Generation for High Performance Computing
Part II: Mesh Generation for Massively Parallel-Based Analysis
M5B Application Building with XML: Standards, Tools, and Demos-Part II
S7B Introduction to Effective Parallel Computing, Part II M6B Parallel Programming with OpenMP: Part II, Advanced Programming
S9 Concurrent Programming with Pthreads M9 Current and Emerging Trends in Cluster Computing
S10 Commodity-based Scalable Visualization M10 Performance Tuning and Analysis for Grid Applications


SUNDAY FULL DAY

S1 Using MPI-2: A Tutorial on Advanced Features of the Message-Passing Interface

Location: D262, D264
William Gropp, Ewing "Rusty" Lusk, Rajeev S. Thakur, Argonne National Laboratory

20% Introductory | 40% Intermediate | 40% Advanced

 

This tutorial will describe how to use MPI-2, the collection of advanced features that were added to MPI (Message-Passing Interface) by the second MPI Forum. These features include parallel I/O, one-sided communication, dynamic process management, language interoperability, and some miscellaneous features. Implementations of MPI-2 are beginning to appear. A few vendors have completed implementations; other vendors and research groups have implemented subsets of MPI-2, with plans for complete implementations. This tutorial will explain how to use MPI-2 in practice, particularly, how to use MPI-2 in a way that results in high performance. We will present each feature of MPI-2 in the form of a series of examples (in C, Fortran, and C++), starting with simple programs and moving on to more complex ones. We assume that attendees are familiar with the basic message-passing concepts of MPI-1.

S2 An Introduction to High Performance Data Mining

Location: D267
Robert L. Grossman, Magnify, Inc. and University of Illinois at Chicago, Vipin Kumar, University of Minnesota

50% Introductory | 30% Intermediate | 20% Advanced

 

Data mining is the semi-automatic discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in data. Traditional data analysis is assumption-driven in the sense that a hypothesis is formed and validated against the data. Data mining, in contrast, is discovery-driven in the sense that patterns are automatically extracted from data. The goal of the tutorial is to provide researchers, practitioners, and advanced students with an introduction to data mining. The focus will be on basic techniques and algorithms appropriate for mining massive data sets using approaches from high performance computing. There are now parallel versions of some of the standard data mining algorithms, including tree-based classifiers, clustering algorithms, and association rules. We will cover these algorithms in detail and as well as some general techniques for scaling data mining algorithms. In addition, we will give an introduction to some of the data mining algorithms which are used in the recommended systems that are becoming important in e-business. The tutorial will include several case studies involving mining large data sets, from 10-1000 Gigabytes in size. The case studies will be from science, engineering and e-business.

  S3 Design and Analysis of High Performance Clusters

Location: D263, D265
Robert Pennington, NCSA, Patricia Kovatch, Barney Maccabe, David Bader, UNM

25% Introductory | 50% Intermediate | 25% Advanced

 

The National Computational Science Alliance (the Alliance) has created several production NT and Linux superclusters for scientists and researchers to run a variety of parallel applications. The goal of this tutorial is to bring together researchers in this area and to share the latest information on the state of high-end commodity clusters. We will discuss details of the design, implementation and management of these systems and demonstrate some of the current system monitoring and management tools. A wide variety of applications and community codes run on these superclusters. We will examine several of these applications and include details on porting and application development tools on both NT and Linux. We will also discuss how to use these tools to tune the system and applications for optimal performance.

  S4 High Performance Numerical Computing in Java: Compiler, Language, and Application Solutions

Location: D270
Manish Gupta, Samuel P. Midkiff, Jose E. Moreira, IBM T.J. Watson Research Center

15% Introductory | 65% Intermediate | 20% Advanced

 

There has been an increasing interest in using Java for the development of high performance numerical applications. Although Java has many attractive features-including reliability, portability, a clean object-oriented model, well defined floating point semantics and a growing programmer base-the performance of current commercial implementations of Java in numerical applications is still an impediment to its wider adoption in the performance-sensitive field. In this tutorial we will describe how standard libraries and currently proposed Java extensions will help in achieving high performance and writing more maintainable code. We will also show how Java virtual machines can be improved to provide near-Fortran performance. The proposals of the Java Grande Forum Numerics Working Group, which include a true multidimensional array package, complex arithmetic and new floating point semantics will be discussed. Compiler technologies to be addressed include array bounds and null pointer check optimizations, alias disambiguation techniques, semantic expansion of standard classes and the interplay of static and dynamic models of compilation. We will also discuss the importance of language, libraries and compiler codesign. The impact of these new technologies on compiler writers, language designers, and application developers will be described throughout the tutorial.

  S5 Performance Analysis and Tuning of Parallel Programs: Resources and Tools

Location: D272
Barton Miller, University of Wisconsin-Madison, Michael Gerndt, Technical University Munich, Germany, Bernd Mohr, Research Centre Juelich, Germany

50% Introductory | 25% Intermediate | 25% Advanced

 

This tutorial will give a comprehensive introduction into the theory and practical application of the performance analysis, optimization and tuning of parallel programs on currently used high-end computer systems like the IBM SP, SGI Origin, and CRAY T3E as well as clusters of workstations. We will introduce the basic terminology, methodology, and techniques of performance analysis and give practical advice on how to use these in an effective manner. Next we describe vendor, third party, and research tools available for these machines along with practical tips and hints for their usage. We show how these tools can be used to diagnose and locate typical performance bottlenecks in real-world parallel programs. Finally, we will give an overview of Paradyn, an example of a state-of-the-art performance analysis tool that can be used for parallel programs of today. The presentation will include the Performance Consultant that automatically locates the performance bottlenecks of user codes. This presentation will be concluded with a live, interactive demonstration of Paradyn.


SUNDAY HALF DAY - AM

  S6A Mesh Generation for High Performance Computing
Part I: An Overview of Unstructured and Structured Grid Generation Techniques

Location: D261
Steven J. Owen, Patrick Knupp, Sandia National Laboratories

100% Introductory | 0% Intermediate | 0% Advanced

 

Mesh generation plays a vital role in computational field simulation for high performance computing. The mesh can tremendously influence the accuracy and efficiency of a simulation. Part I of this tutorial will provide an overview of the principal techniques currently in use for constructing computational grids for both unstructured and structured techniques. For unstructured techniques, Delaunay, advancing front and octree methods will be described with respect to triangle and tetrahedral elements. An overview of current quadrilateral and hexahedral methods will be provided, including medial axis, paving, q-morph, sub-mapping, plastering, sweeping and whisker weaving as well as mixed element methods such as hex-tet and h-morph. A survey of some of the unstructured codes currently available will also be provided. For structured techniques, the idea of a mapping from a logical to a physical domain will be discussed. Transfinite interpolation, Lagrange and Hermite interpolation techniques will be described. A one-dimensional problem will be used as an example to introduce basic ideas in grid generation such as grid generation PDE's, optimization, and variational techniques. In addition, basic approaches to two-dimensional grid generation, such as algebraic, conformal mapping, elliptic, and hyperbolic, will be presented. Application of structured grid generation techniques to curves and surfaces as well as adaptive methods will also be described.

  S7A Introduction to Effective Parallel Computing, Part I

Location: D271, D273
Quentin F. Stout, Christiane Jablonowski, University of Michigan

75% Introductory | 25% Intermediate | 0% Advanced

 

Effective parallel computing is one of the key solutions to today's computational challenges. This two-part tutorial will provide a comprehensive and practical overview of parallel computing, emphasizing those aspects most relevant to the user. It is designed for new users, managers, students, and people needing a general overview of parallel computing. The tutorial discusses both hardware and software aspects, with an emphasis on standards, portability and systems that are now (or soon will be) commercially or freely available. Systems examined range from low-cost clusters to highly integrated supercomputers. Part I surveys basic parallel computing concepts and terminology, such as scalability and cache coherence, and illustrates fundamental parallelization approaches using examples from engineering, scientific and data intensive applications. These real-world examples are targeted at distributed memory systems, using the message passing language MPI, and at shared memory systems, using the compiler directive standard OpenMP. Both parallelization approaches will be briefly outlined. The tutorial shows step-by-step parallel performance improvements, and discusses some of the software engineering aspects of the parallelization process. Furthermore, pointers to the literature and web-based resources will be provided. This tutorial can serve as an introduction to specialized programming tutorials.

  S8 Tools and System Support for Managing and Manipulating Large Scientific Datasets

Location: D274
Joel Saltz, University of Maryland, Johns Hopkins School of Medicine, Alan Sussman, Tahsin Kurc, University of Maryland

30% Introductory | 60% Intermediate | 10% Advanced

 

This tutorial will address the design, implementation and use of systems for managing and manipulating very large datasets, both on disk and in archival storage. The datasets we target are generated through large scale simulations or gathered by advanced sensors, such as those attached to satellites or microscopes. These datasets are large (hundreds of gigabytes to many terabytes) and typically represent physical quantities, measurements or composited images in a physical or attribute space. Two systems will be described in detail, the Active Data Repository (ADR) and DataCutter. ADR is designed to optimize storage and processing of disk-based large datasets on a parallel machine or network of workstations, while DataCutter is designed to provide support for subsetting and filtering operations on datasets stored in archival (tertiary) storage systems in a Grid environment. The overall design of ADR and the interfaces for customizing ADR for particular data intensive applications, will be explained, and an example application will be used to illustrate the customization. The customization includes storing and indexing datasets into ADR, and providing user-defined processing functions for the end application. Similarly, the design and current implementation of the DataCutter services will be described, and examples of filter-based applications will be discussed. The relationship of ADR and DataCutter to other systems software for data intensive computing will also be addressed. Such systems include the ISI/Argonne Globus Metacomputing toolkit, the UTK NetSolve network-based computational server and the SDSC Storage Resource Broker (SRB).


SUNDAY HALF DAY - PM

  S6B Mesh Generation for High Performance Computing
Part II: Mesh Generation for Massively Parallel-Based Analysis

Location: D261
Scott Mitchell, Patrick Knupp, and Timothy Tautges, Sandia National Laboratories

50% Introductory | 50% Intermediate | 0% Advanced

 

Part II will focus on specific application of mesh generation techniques to high performance computing. Topics discussed will include advanced hexahedral algorithms, mesh quality and mesh generation issues related to massively-parallel based analysis. Advanced hexahedral mesh generation algorithms for meshing assembly geometries will be described; these algorithms are found in the CUBIT Mesh Generation Toolkit and other mesh generation packages. Also discussed will be additional hexahedral mesh generation research ideas. Basic mesh quality requirements will be described for finite element meshes including shape and size metrics for both simplicial and non-simplicial element types. Mesh quality can be improved by various node-movement strategies; their use in mesh sweeping and morphing algorithms will be described. Mesh quality metrics, used to devise discrete objective functions, will also be discussed. Advances in massively parallel and high performance computing have made possible computational simulation at much higher fidelity and finer resolution. Issues specific to larger analysis will be introduced, including techniques for handling model complexity, a team-based approach to generating meshes, tools for generating meshes in pieces and assembling the pieces into a larger mesh, and preparing the mesh for input to massively parallel-based analysis. analysis.

  S7B Introduction to Effective Parallel Computing, Part II

Location: D271, D273
Quentin F. Stout, Christiane Jablonowski, University of Michigan

50% Introductory | 50% Intermediate | 0% Advanced

 

This tutorial will provide broader and deeper insight into the iterative process of converting a serial program into an increasingly efficient, and correct, parallel program. This tutorial assumes the basic background knowledge of parallel computing concepts and terminology presented in Part I. Using examples from large-scale engineering and scientific applications, Part II will discuss the steps necessary to achieve high performance on distributed memory, shared memory and vector parallel machines. We will give an overview of techniques for code optimization, load balancing, communication reduction and efficient use of cache. The tutorial will include up-to-date performance analysis tools, showing how they can help diagnose and locate typical bottleneck situations in parallel applications and provide hints for tuning. In addition, aspects such as the user's view of system's software and principle life-cycle concerns with parallel software will be addressed. Overall, the tutorial will give an overview of the primary parallelization options available, explaining how they are used in real-world applications and what they are most suitable for. These guidelines will help users make intelligent planning decisions when selecting among the various software approaches and hardware platforms.

  S9 Concurrent Programming with Pthreads

Location: D268
Clay P. Breshears, Henry A. Gabb, Kuck & Associates, an Intel Company

65% Introductory | 35% Intermediate | 0% Advanced

 

Multithreading is becoming more prevalent with the increasing popularity of symmetric multiprocessors (SMPs). Multithreading allows programmers to utilize shared memory hardware to its fullest. Pthreads is the POSIX standard library for multithreading and is available on a wide range of platforms. The Pthreads library consists of over 60 functions governing thread creation and management, synchronization, and scheduling. This tutorial will cover design issues involved in concurrent and multithreaded programming, using Pthreads as a practical means of implementation. Before laying a foundation in concurrency, the tutorial will introduce a core of the most useful Pthreads functions. Each function will be discussed in detail with example codes to illustrate usage. Classic models (e.g., monitors, rendezvous, and producer/consumer) will illustrate the use of threads to express concurrent tasks as well as the pitfalls of race conditions and deadlock.

  S10 Commodity-based Scalable Visualization

Location: D274
Constantine J. Pavlakos, Sandia National Laboratories, Randall Frank, Lawrence Livermore National Laboratory, Patrick Hanrahan, Stanford University, Kai Li, Princeton University, Alan Heirich, Compaq Tandem Labs, Allen McPherson, Los Alamos National Laboratory

40% Introductory | 40% Intermediate | 20%Advanced

 

The DOE's ASCI Program is constructing massive compute platforms for the purpose of enabling extremely complex computational simulations. Further, the ASCI/VIEWS program is working to develop data management, data exploration, and visualization technologies that are matched to ASCI's compute capabilities. To do so, technologies that scale to the power of thousands of today's highest performing graphics systems must be developed, and super-resolution display systems are needed that enable the visual comprehension of intricate details in high-fidelity data. Cost-effectiveness is also an important pragmatic consideration. This tutorial will motivate the investigation of cluster-based graphics systems, introduce the participant to information regarding the construction of such clusters, address various issues related to the components used, introduce the participant to how parallel rendering can be achieved on the clustered architectures, and provide an overview of results that have been achieved. The status of efforts to develop such systems by the three ASCI national laboratories and certain external partners will also be presented. Upon completing the tutorial, the participant should have a much better understanding of what it takes to construct such a system, what features they offer, whether such systems show any promise for scalable visualization, and what challenging issues remain to be addressed.


MONDAY FULL DAY

  M1 Performance Analysis and Prediction for Large-Scale Scientific Applications

Location: D274
Adolfy Hoisie, Harvey J. Wasserman, Los Alamos National Laboratory

30% Introductory | 50% Intermediate | 20% Advanced

 

Performance is the most important criterion for a supercomputer. But how do you measure performance? We will present a methodical, simplified approach to analysis and modeling of large-scale, parallel, scientific applications. Various techniques (modeling, simulation, queuing theory), will be discussed so as to become a part of the application developer's toolkit. We will introduce rigorous metrics for serial and parallel performance and analyze the single most important single-processor bottleneck - the memory subsystem. We will demonstrate how to obtain diagnostic information about memory performance of codes and how to use such information to bound achievable performance. Commonly-utilized techniques for performance optimization of serial and parallel Fortran codes will also be presented. Finally, we will discuss analytical modeling of application scalability using ASCI codes as examples. No particular machine will be emphasized; rather we will consider RISC processors and widely utilized parallel systems, including clusters of SGI Origin2000s, IBM SP2 and CRAY T3E.

  M2 Parallel I/O for Application Developers

Location: D272
John M May, Lawrence Livermore National Laboratory

50% Introductory | 50% Intermediate | 0% Advanced

 

This tutorial will present parallel I/O techniques for developers of scientific applications. Because the design of storage devices and file systems profoundly affects I/O performance, the course will begin with a brief review of these topics. It will then proceed to examine the I/O patterns that are common in large scientific applications and show how these patterns affect I/O performance. Next, it will look at a variety of techniques that have been developed to improve performance for common access patterns and discuss their pros and cons. Attendees will learn how to put these techniques into practice using modern I/O interfaces such as MPI-IO and HDF5. We will also discuss two specialized forms of I/O used in parallel computing: checkpointing and data staging for out-of-core problems. The tutorial will conclude with a discussion of current research in the area of scientific data management, including data mining.

  M3 Framework Technologies & Methods for Large Data Visualization

Location: D261
W. T. Hewitt, University of Manchester, I. Curington, Advanced Visual Systems Inc.

20% Introductory | 70% Intermediate | 10% Advanced

 

This tutorial will address large data visualization issues in the context of commercial visualization tool development. A review of techniques for multidimensional data visualization will be followed by case studies from CEM, CFD, VLSI, Medicine, and Geophysics. An artifact of this type of visualization is that the visualization task itself becomes a consumer of HPC resources. The second part of the tutorial is concerned with the issues of implementing these techniques in a multiprocessor environment, and improving the performance of current visualization systems. A range of technical areas will be discussed, including experimental research and production algorithm development. Both current research and future challenges facing visualization system vendors will be discussed. Attendees at the tutorial will gain an understanding of the issues underlying visualization in a parallel and distributed environment including: familiarity with domain decomposition methods and parallelisation techniques; knowledge of the principles of volume, flow, and multidimensional visualization; ability to use distributed computation to enable accurate and timely visualization of large complex datasets; and familiarity with the latest developments in visualization and HPC systems.

  M4 Computational Biology and High Performance Computing

Location: D267
Manfred Zorn, Sylvia Spengler, NERSC/CBCG, Horst Simon, NERSC, Craig A. Stewart, Indiana University, Inna Dubchak, NERSC/CBCG

40% Introductory | 40% Intermediate | 20% Advanced

 

The pace of extraordinary advances in molecular biology has accelerated in the past decade due in large part to discoveries coming from genome projects on human and model organisms. The advances in the genome project so far, happening well ahead of schedule and under budget, have exceeded any dreams by its protagonists, let alone formal expectations. Biologists expect the next phase of the genome project to be even more startling in terms of dramatic breakthroughs in our understanding of human biology, the biology of health and of disease. Only today can biologists begin to envision the necessary experimental, computational and theoretical steps necessary to exploit genome sequence information for its medical impact, its contribution to biotechnology and economic competitiveness, and its ultimate contribution to environmental quality. High performance computing has become one of the critical enabling technologies, which will help to translate this vision of future advances in biology into reality. Biologists are increasingly becoming aware of the potential of high performance computing. This tutorial will introduce the exciting new developments in computational biology and genomics to the high performance computing community.


MONDAY HALF DAY - AM

  M5A Application Building with XML: Standards, Tools, and Demos-Part I

Location: D271, D273
Bertram Ludaescher, Richard Marciano, UCSD/SDSC

40% Introductory | 40% Intermediate | 20% Advanced

 

This tutorial will guide participants through the maze of emerging XML standards and focus on practical areas where the use of XML can have immediate benefits for application development. The general theme is "What you need to know for rearchitecting your HPC with XML under the hood". This tutorial provides a "jump-start" to XML ("what you always wanted to know about XML") and includes a roadmap to the XML universe, coverage of core standards and technology, and how to (re-)architect "XML-enabled" applications.

  M6A Parallel Programming with OpenMP: Part I, Introduction

Location: D262, D264
Rudolf Eigenmann, Ph.D. Purdue University, Tim Mattson, Ph.D. Intel Corp.

75% Introductory | 20% Intermediate | 5% Advanced

 

OpenMP is an Application Programming Interface for directive-driven parallel programming of shared memory computers. Fortran, C and C++ compilers supporting OpenMP are available for Unix and NT workstations. Most vendors of shared memory computers are committed to OpenMP making it the de facto standard for writing portable, shared memory, parallel programs. This tutorial will provide a comprehensive introduction to OpenMP. We will start with basic concepts to bring the novice up to speed. We will then present a few more advanced examples to give some insight into questions that come up for experienced OpenMP programmers. Over the course of the morning, we will discuss the following: the OpenMP parallel programming model and its specification in Fortran, C and C++; examples of OpenMP programs from scientific/engineering applications; and the status of OpenMP compilers and tools.

  M7 High-Speed Numerical Linear Algebra: Algorithms and Research Directions

Location: D270
Jack Dongarra, University of Tennessee, Iain Duff, Rutherford Lab, Location: D270
Danny Sorensen, Rice University, Henk van der Vorst, University of Utrecht

20% Introductory | 50% Intermediate | 30% Advanced

 

Present computers, even workstations, allow the solution of very large scale problems in science and engineering. Most often a major part of the computational effort goes in solving linear algebra subproblems. We will discuss a variety of algorithms for these problems, indicating where each is appropriate and emphasizing their efficient implementation. Many of the sequential algorithms used satisfactorily on traditional machines fail to exploit the architecture of advanced computers. We briefly review some of the features of modern computer systems and illustrate how the architecture affects the potential performance of linear algebra algorithms. We will consider recent techniques devised to utilize advanced architectures more fully, especially the design of the Level 1, 2, and 3 BLAS. We will highlight the LAPACK package which provides a choice of algorithms mainly for dense matrix problems that are efficient and portable on a variety of high performance computers. For large sparse linear systems, the situation is more complicated and a wide range of algorithms is available. We will give an introduction to this field and guidelines on the selection of appropriate software. We will consider both direct methods and iterative methods of solution, including some recent work that can be viewed as a hybrid of the two. Finally, we address the challenge facing designers of mathematical software in view of the development of highly parallel computer systems. We shall discuss ScaLAPACK, a project to develop and provide high performance scalable algorithms suitable for highly parallel computers.

  M8 Parallel Programming for Cluster Computers

Location: D263, D265
David A. Bader, University of New Mexico, Bruce Hendrickson, Steve Plimpton, Sandia National Laboratories

25% Introductory | 50% Intermediate | 25% Advanced

 

The price/performance benefits inherent in commodity cluster computers are attracting the attention of a wide range of researchers, many of whom have not traditionally been involved in parallel computing. Clusters are likely to be the computational platforms of choice over the next decade not just for computer scientists, but for disciplinary researchers in fields such as bioinformatics, astrophysics and economics. Unfortunately, these clusters are among the most challenging parallel computers to use effectively. In this tutorial, we will describe the performance challenges inherent to commodity clusters, including poor communication performance, heterogeneity, and 2-level hardware (clusters of 2- and 4-way SMP nodes). We will outline an approach for designing scientific algorithms that works well for clusters and present some case studies. The approach is based on a distributed-memory message-passing model with an emphasis on load-balance, minimal communication, and latency-tolerant algorithms. Finally we'll also highlight tools and libraries currently available for improving parallel programming productivity.


MONDAY HALF DAY - PM

  M5B Application Building with XML: Standards, Tools, and Demos-Part II

Location: D271, D273
Richard Marciano, Ph.D. UCSD/SDSC, Bertram Ludaescher, Ph.D. UCSD/SDSC

20% Introductory | 30% Intermediate | 50% Advanced

 

This tutorial will include a brief XML overview; however, it will mostly focus on how multiple standards can be used and woven together to help build useful applications that can benefit HPC application developers. This tutorial naturally extends Part I, however it is packaged independently and is self-contained. If you already know how to design, create, store, and massage your data with XML, XPath, XSLT and the like, then you are ready for "more mileage with XML." This tutorial focuses on applications including website management, directory services and wireless computing with XML.

  M6B Parallel Programming with OpenMP: Part II, Advanced Programming

Location: D262, D264
Rudolf Eigenmann, Purdue University, Tim Mattson, Intel Corp.

10% Introductory | 50% Intermediate | 40% Advanced

 

OpenMP is rapidly becoming the programming model of choice for shared-memory machines. After a very brief overview of OpenMP basics we will move on to intermediate and advanced topics, such as advanced OpenMP language features, traps that programmers may fall into, and a more extensive outlook on future OpenMP developments. We will also discuss mixing OpenMP with message passing applications written in MPI. We will present many examples of OpenMP programs and discuss their performance behavior. Over the course of the afternoon, we will discuss the following: a brief overview of the OpenMP parallel programming model in C, C++, and Fortran; problems and solutions in OpenMP programming; advanced examples of OpenMP programs and their performance; and future developments for OpenMP.

  M9 Current and Emerging Trends in Cluster Computing

Location: D263, D265
Mark Baker, University of Portsmouth, Rajkumar Buyya, Monash University, Melbourne Australia, Jack Dongarra, University of Tennessee

50% Introductory | 30% Intermediate | 20% Advanced

 

The commercial success of clusters has pushed them into the vanguard of general purpose computing. They have now permeated all spheres of the computing industry, from the traditional science and engineering field through to the retail and commercial marketplace. This commodity-driven computing platform is advancing at a tremendous pace in terms of both new and emerging hardware and the associated software tools and environments. Clusters are now the platform of choice for providing computing services to a huge range of diverse applications. This tutorial discusses the current and emerging trends in cluster computing. In particular, we detail the current and emerging technologies in areas such as system architecture, networking, software environments, systems configuration and management tools as well as application libraries and utilities. In the second half of the tutorial, we review four successfully deployed cluster systems that are being used in commerce, industry and research environments. Finally, we summarize our findings, drawing a number of conclusions about current clusters and then briefly consider emerging technology trends and how these will influence clusters of the future.

  M10 Performance Tuning and Analysis for Grid Applications

Location: D270
Brian Tierney, Lawrence Berkeley National Laboratory, Rich Wolski, University of Tennessee, Dan Gunter, Lawrence Berkeley National Laboratory, Martin Swany, University of Tennessee

25% Introductory | 75% Intermediate | 0% Advanced

 

For distributed application developers, achieving high-performance in a Grid environment can be especially challenging due to the fact that the bottlenecks can be in any of a number of places, such as the hosts, networks, operating systems, applications, and so on. Therefore one must monitor all system components and instrument all software. This tutorial will discuss what should be monitored, and describe some tools that can be used to perform the monitoring and manage the large volumes of performance information that results. It will also discuss scalable techniques for instrumenting applications and system level resources. Both the problem of scaling end-to-end network performance monitoring and the tradeoff between monitor intrusiveness and monitor accuracy will be discussed. The tutorial will show how to use the monitored data to do performance analysis and to predict resource load and availability dynamically. Further, we will discuss some general techniques for improving application performance in a high-speed WAN environment.