The post About SCXY appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>The Supercomputing (SC) conference series is an annual event (XY as years change) that is organized by a voluntary consortium of persons acting together to advance the science and application of high-performance computing and communications technology.
The first Supercomputing conference was held in Orlando, FL in 1988, and has been held annually since, usually in November, at the following locations:
| 1989 Reno NV 1990 New York City, NY 1991 Albuquerque, NM 1992 Minneapolis, MN 1993 Portland, OR 1994 Washington, DC |
1995 San Diego, CA 1996 Pittsburgh, PA 1997 San Jose, CA 1998 Orlando, FL 1999 Portland, OR |
In 1997, the conference name of “Supercomputing” was abbreviated to simply, “SC”, and reflects the expanded nature of the conference since 1995 to include high-performance networking and data management as well as supercomputing. The entire conference series is now commonly referred to generically as “SCXY”.
To access information about all SCXY conferences, please visit www.supercomp.org
For an interesting and more detailed chronology of the SC conference series, please visit the History of the Conference web site.
The post About SCXY appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>The post WEBCASTS appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>SC2000 will broadcast live over the Internet select presentations from the Technical Program, the Exhibition, Venture Village and eSCape 2000 using the unique capabilities of SCinet 2000 to deliver high-quality audio and video broadcasts.
SC2000 will again be webcasting the keynote address and the plenary sessions. This year, the keynote and plenary sessions will be archived and viewable for the duration of the SC2000 conference. SC2000 will also be adding select industry and research demonstrations from the exhibition hall to the webcast program. In addition, the webcast team will again support a robotic camera to allow viewers to pan the exhibit hall and zoom in on exhibit floor booths.
The SC2000 webcast events will be broadcast in several formats to accommodate users who may not have access to a high-performance network to their desktop. These formats will include an audio-only broadcast for lower bandwidth connections, an audio with video broadcast for medium-bandwidth connections, and a high-bandwidth audio with video multicast.
Webcast events are indicated by this symbol in the conference program and on conference signage. User support will be available by telephone and email for the duration of all live broadcasts. Feedback regarding the quality and scope of Webcasts is strongly encouraged.
Webcast Help Desk
In an effort to help solve any problems our remote viewers may encounter we are providing a webcast help desk. User support will be available by telephone and email shortly before, during, and shortly after all webcast events. Please visit the webcast pages for contact information.
All webcast viewers are encouraged to provide feedback on the quality and scope of the broadcasts. Tell us what you found useful, what you would like to see in the future, and how effective you found the webcasts.
Webcast Infrastructure
The SC2000 webcast infrastructure will include several key components. A PC-class machine located in the presentation area will be used to convert audio and video feeds of the keynote and plenary talks to a digital format. A wireless PC-class laptop will capture events in the Exhibition, Venture Village, and eSCape 2000 areas. Finally, a server-class system located in the SC2000 webcast booth near the SCinet 2000 network operations center will make this content available to remote viewers. SCinet 2000 is providing all supporting network infrastructure. Webcasting components include a high-bandwidth connection out of the convention center and a 100BaseT dedicated connection between the PC encoder in the presentation area and the webcast server.
The post WEBCASTS appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>The post VENTURE VILLAGE appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>An innovative new exhibit joins SC2000 to showcase the best of the best! The Venture Village will showcase a collection of entrepreneurial information technology companies, which are creating new products for the 21st century.The Venture Village will be located strategically on the Main exhibit floor, and will offer a village atmosphere for visiting with the venture companies and/or your colleagues.
The post VENTURE VILLAGE appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>The post TUTORIAL PROGRAM appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>This year’s tutorials include exciting offerings in new topics, such as mesh generation, XML, parallel programming for clusters, numerical computing in Java, and management of large scientific datasets, along with the return of some of the most requested presenters from prior years, with new and updated materials. In addition, we offer some of the full day tutorials as two half-day tutorials (denoted by the numbering with A and B), thereby increasing the number of half-day tutorials. We have a total of nine full-day and 15 half-day tutorials covering 20 topics. Attendees also have the opportunity for an international perspective on topics through the tutorials on large data visualization, cluster computing, performance analysis tools and numerical linear algebra. Separate registration is required for tutorials; tutorial notes and luncheons will be provided on site (additional tutorial notes will be sold on site). A One- or Two-day Tutorial Passport allows attendees the flexibility to attend multiple tutorials.
See the Tutorials at a Glance table below for a list of this year’s tutorials. Detailed information about all tutorials can be found by scrolling further down this page.
SC2000 TUTORIALS AT A GLANCE
The following tutorials have been accepted for presentation at SC2000. Author and abstract information can be obtained by clicking on the tutorial number (S1, M1, etc.) or by scrolling further down this page.
| SUNDAY FULL DAY | MONDAY FULL DAY | ||
| S1 | Using MPI-2: A Tutorial on Advanced Features of the Message-Passing Interface | M1 | Performance Analysis and Prediction for Large-Scale Scientific Applications |
| S2 | An Introduction to High Performance Data Mining | M2 | Parallel I/O for Application Developers |
| S3 | Design and Analysis of High Performance Clusters | M3 | Framework Technologies & Methods for Large Data Visualization |
| S4 | High Performance Numerical Computing in Java: Compiler, Language, and Application Solutions | M4 | Computational Biology and High Performance Computing |
| S5 | Performance Analysis and Tuning of Parallel Programs: Resources and Tools | ||
| SUNDAY HALF DAY – AM | MONDAY HALF DAY – AM | ||
| S6A | Mesh Generation for High Performance Computing Part I: An Overview of Unstructured and Structured Grid Generation Techniques |
M5A | Application Building with XML: Standards, Tools, and Demos-Part I |
| S7A | Introduction to Effective Parallel Computing, Part I | M6A | Parallel Programming with OpenMP: Part I, Introduction |
| S8 | Tools and System Support for Managing and Manipulating Large Scientific Datasets | M7 | High-Speed Numerical Linear Algebra: Algorithms and Research Directions |
| M8 | Parallel Programming for Cluster Computers | ||
| SUNDAY HALF DAY – PM | MONDAY HALF DAY – PM | ||
| S6B | Mesh Generation for High Performance Computing Part II: Mesh Generation for Massively Parallel-Based Analysis |
M5B | Application Building with XML: Standards, Tools, and Demos-Part II |
| S7B | Introduction to Effective Parallel Computing, Part II | M6B | Parallel Programming with OpenMP: Part II, Advanced Programming |
| S9 | Concurrent Programming with Pthreads | M9 | Current and Emerging Trends in Cluster Computing |
| S10 | Commodity-based Scalable Visualization | M10 | Performance Tuning and Analysis for Grid Applications |
The post TUTORIAL PROGRAM appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>The post TECHNICAL PROGRAM appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>63 technical papers, selected from 179 submissions, including six finalists for Gordon Bell prizes. These papers were chosen based on the significance of their contributions to the field of high performance networking and computing (HPNC).
Twenty-four tutorials (9 full day and 15 half day sessions) on Sunday and Monday. These tutorials were chosen competitively to address expressed interests of SC attendees.
Four State of the Field plenary talks, highlighting the most significant results and critical future directions in areas of study that are vital to HPNC.
The new “Masterworks” track of invited speakers that will showcase the application of HPNC technologies to deliver new capabilities to scientists and the general public.
Seven panel sessions, chosen to stimulate understanding of the issues surrounding hot topics of debate within the HPNC community.
The plenary awards session on Thursday afternoon that is expected to include a talk by the most recent winner of the IEEE Computer Society Seymour Cray Computer Engineering Award.
Numerous competitively selected “research gems” representing late-breaking research results that will be showcased in a special area on the Exhibit Floor.
The technical program committee is very proud of the program this year. It would not have been possible without the volunteer efforts of over one hundred committee members and reviewers. Thank you for once again making SC2000 the premier technical conference in this field!
The post TECHNICAL PROGRAM appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>The post TECHNICAL PAPERS appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>Note: for all papers listed, the first author is the presenter, unless another author in the list has an asterisk after his/her name. There is also an Author Index available.
| MPI | APPLICATIONS I |
| NUMERICAL ALGORITHMS | APPLICATIONS II |
| SCHEDULING | AWARDS SESSION |
| MPI/OPENMP | BIOMEDICAL APPLICATIONS |
| POTPOURRI | CLUSTER INFRASTRUCTURE |
| CLUSTER INFRASTRUCTURE | COMPILER OPTIMIZATION |
| QOS/FAULT TOLERANCE | DATA GRID |
| BIOMEDICAL APPLICATIONS | GORDON BELL I |
| APPLICATIONS I | GORDON BELL II |
| VISUALIZATION | GRID MIDDLEWARE |
| COMPILER OPTIMIZATION | HARDWARE BASED TOOLS |
| APPLICATIONS II | MPI |
| NETWORKING | MPI/OPENMP |
| HARDWARE BASED TOOLS | NETWORKING |
| GORDON BELL I | NUMERICAL ALGORITHMS |
| PARALLEL PROGRAMMING | PARALLEL PROGRAMMING |
| SOFTWARE TOOLS | POTPOURRI |
| DATA GRID | QOS/FAULT TOLERANCE |
| GORDON BELL II | SCHEDULING |
| AWARDS SESSION | SCIENCE APPLICATIONS SUPPORT |
| SCIENCE APPLICATIONS SUPPORT | SOFTWARE TOOLS |
| GRID MIDDLEWARE | VISUALIZATION |
The Failure of TCP in High-Performance Computational Grids
Distributed computational grids depend on TCP to ensure reliable end-to-end communication between nodes across the wide-area network (WAN). Unfortunately, TCP performance can be abysmal even when buffers on the end hosts are manually optimized. Recent studies blame the self-similar nature of aggregate network traffic for TCP’s poor performance because such traffic is not readily amenable to statistical multiplexing in the Internet, and hence computational grids.
In this paper, we identify a source of self-similarity previously ignored, a source that is readily controllable-TCP. Via an experimental study, we examine the effects of the TCP stack on network traffic using different implementations of TCP. We show that even when aggregate application traffic ought to smooth out as more applications’ traffic are multiplexed, TCP induces burstiness into the aggregate traffic load, thus adversely impacting network performance. Furthermore, our results indicate that TCP performance will worsen as WAN speeds continue to increase.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
PSockets: The Case for Application-level Network Striping for Data Intensive Applications using High Speed Wide Area Networks
Transmission Control Protocol (TCP) is used by various applications to achieve reliable data transfer. TCP was originally designed for unreliable networks. With the emergence of high-speed wide area networks various improvements have been applied to TCP to reduce latency and achieve improved bandwidth. The improvement is achieved by having system administrators tune the network and can take a considerable amount of time. This paper introduces PSockets (Parallel Sockets), a library that achieves an equivalent performance without manual tuning. The basic idea behind PSockets is to exploit network striping. By network striping we mean striping partitioned data across several open sockets. We describe experimental studies using PSockets over the Abilene network. We show in particular that network striping using PSockets is effective for high performance data intensive computing applications using geographically distributed data.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
Efficient Wire Formats for High Performance Computing
High performance computing is being increasingly utilized in non-traditional circumstances where it must interoperate with other applications. For example, online visualization is being used to monitor the progress of applications, and real-world sensors are used as inputs to simulations. Whenever these situations arise, there is a question of what communications infrastructure should be used to link the different components. Traditional HPC-style communications systems such as MPI offer relatively high performance, but are poorly suited for developing these less tightly-coupled cooperating applications. Object-based systems and meta-data formats like XML offer substantial plug-and-play flexibility, but with substantially lower performance. We observe that the flexibility and baseline performance of all these systems is strongly determined by their “wire format,” or how they represent data for transmission in a heterogeneous environment. We examine the performance implications of different wire formats and present an alternative with significant advantages in terms of both performance and flexibility.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
HARDWARE BASED TOOLS
Using Hardware Performance Monitors to Isolate Memory Bottlenecks
In this paper, we present and evaluate two techniques that use different styles of hardware support to provide data structure specific processor cache information. In one approach, hardware performance counter overflow interrupts are used to sample cache misses. In the other, cache misses within regions of memory are counted to perform an n-way search for the areas in which the most misses are occurring. We present a simulation-based study and comparison of the two techniques. We find that both techniques can provide accurate information, and describe the relative advantages and disadvantages of each.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
Hardware Prediction for Data Coherency of Scientific Codes on DSM
This paper proposes a hardware mechanism for reducing coherency overhead occurring in scientific computations within DSM systems. A first phase aims at detecting, in the address space regular patterns (called streams) of coherency events (such as requests for exclusive, shared or invalidation).
Once a stream is detected at a loop level, regularity of data access can be exploited at the loop level (spatial locality) but also between loops (temporal locality). We present a hardware mechanism capable of detecting and exploiting efficiently these regular patterns.
Expectable benefits as well as hardware complexity are discussed and the limited drawbacks and potential overheads are exposed.
For a benchmarks suite of typical scientific applications results are very promising, both in terms of coherency streams and the effectiveness of our optimizations.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
A Scalable Cross-Platform Infrastructure for Application Performance Tuning Using Hardware Counters
The purpose of the PAPI project is to specify a standard API for accessing hardware performance counters available on most modern microprocessors. These counters exist as a small set of registers that count ”events”, which are occurrences of specific signals and states related to the processor’s function. Monitoring these events facilitates correlation between the structure of source/object code and the efficiency of the mapping of that code to the underlying architecture. This correlation has a variety of uses in performance analysis and tuning. The PAPI project has proposed a standard set of hardware events and a standard cross-platform library interface to the underlying counter hardware. The PAPI library has been or is in the process of being implemented on all major HPC platforms. The PAPI project is developing end-user tools for dynamically selecting and displaying hardware counter performance data. PAPI support is also being incorporated into a number of third-party tools.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
A Comparative Study of the NAS MG Benchmark across Parallel Languages and Architectures
Hierarchical algorithms such as multigrid applications form an important cornerstone for scientific computing. In this study, we take a first step toward evaluating parallel language support for hierarchical applications by comparing implementations of the NAS MG benchmark in several parallel programming languages: Co-Array Fortran, High Performance Fortran, Single Assignment C, and ZPL. We evaluate each language in terms of its portability, its performance, and its ability to express the algorithm clearly and concisely. Experimental platforms include the Cray T3E, IBM SP, SGI Origin, Sun Enterprise 5500 and a high-performance Linux cluster. Our findings indicate that while it is possible to achieve good portability, performance, and expressiveness, most languages currently fall short in at least one of these areas. We find a strong correlation between expressiveness and a language’s support for a global view of computation, and we identify key factors for achieving portable performance in multigrid applications.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
Is Data Distribution Necessary in OpenMP?
This paper investigates the performance implications of data placement in OpenMP programs running on modern ccNUMA multiprocessors. Data locality and minimization of the rate of remote memory accesses are critical for sustaining high performance on these systems. We show that due to the low remote-to-local memory access latency ratio of state-of-the-art ccNUMA architectures, reasonably balanced page placement schemes-such as round-robin or random distribution of pages-incur modest performance losses. We also show that performance leaks stemming from suboptimal page placement schemes can be remedied with a smart user-level page migration engine. The main body of the paper describes how the OpenMP runtime environment can use page migration for implementing implicit data distribution and redistribution schemes without programmer intervention. Our experimental results support the effectiveness of these mechanisms and provide a proof of concept that there is no need to introduce data distribution directives in OpenMP and warrant the portability of the programming model.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
Extending OpenMP for NUMA Machines
This paper describes extensions to OpenMP which implement data placement features needed for NUMA architectures. OpenMP is a collection of compiler directives and library routines used to write portable parallel programs for shared-memory architectures. Writing efficient parallel programs for NUMA architectures, which have characteristics of both shared-memory and distributed-memory architectures, requires that a programmer control the placement of data in memory and the placement of computations that operate on that data. Optimal performance is obtained when computations occur on processors that have fast access to the data needed by those computations. OpenMP-designed for shared-memory architectures-does not by itself address these issues.
The extensions to OpenMP Fortran presented here have been mainly taken from High Performance Fortran. The paper describes some of the techniques that the Compaq Fortran compiler uses to generate efficient code based on these extensions. It also describes some additional compiler optimizations, and concludes with some preliminary results.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
SCHEDULING
Randomization, Speculation, and Adaptation in Batch Schedulers
This paper proposes extensions to the backfilling job-scheduling algorithm that significantly improve its performance. We introduce variations that sort the “backfilling order” in priority-based and randomized fashions. We examine the effectiveness of guarantees present in conservative backfilling and find that initial guarantees have limited practical value, while the performance of a “no-guarantee” algorithm can be significantly better when combined with extensions that we introduce. Our study differs from many similar studies in using traces that contain user estimates. We find that actual overestimates are large and significantly different from simple models. We propose the use of speculative backfilling and speculative test runs to counteract these large over-estimations. Finally, we explore the impact of dynamic, system-directed adaptation of application parallelism. The cumulative improvements of these techniques decrease the bounded slowdown, our primary metric, to less then 15 percent of conservative backfilling.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
An Object-Oriented Job Execution Environment
This is a project for developing a distributed job execution environment for highly iterative jobs. An iterative job is one where the same binary code is run hundreds of times with incremental changes in the input values for each run. An execution environment is a set of resources on a computing platform that can be made available to run the job and hold the output until it is collected. The goal is to design a complete, object-oriented scheduling system that will run a variety of jobs with minimal changes. Areas of code that are unique to one specific type of job are decoupled from the rest. The system allows for fine-grained job control, timely status notification and dynamic registration and deregistration of execution platforms depending on resources available. Several objected-oriented technologies are employed: Java, CORBA, UML, and software design patterns. The environment has been tested using a CFD code, INS2D.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
Towards an Integrated, Web-executable Parallel Programming Tool Environment
We present a new parallel programming tool environment that is (1) accessible and executable ”anytime, anywhere,” through standard Web browsers and (2) integrated in that it provides tools which adhere to a common underlying methodology for parallel programming and performance tuning. The environment is based on a new network computing infrastructure developed at Purdue University.
We evaluate our environment qualitatively by comparing our tool access method with conventional schemes of software download and installation. We also quantitatively evaluate the efficiency of interactive tool access in our environment. We do this by measuring the response times of various functions of the Ursa Minor tool and compare them with those of a Java Applet-based “anytime, anywhere” tool access method. We found that our environment offers significant advantages in terms of tool accessibility, integration, and efficiency.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
MPI/OPENMP
Performance of Hybrid Message-Passing and Shared-Memory Parallelism for Discrete Element Modeling
The current trend in HPC hardware is towards clusters of shared-memory (SMP) compute nodes. For applications developers the major question is how best to program these SMP clusters. To address this we study an algorithm from Discrete Element Modeling, parallelized using both the message-passing and shared-memory models simultaneously (“hybrid” parallelization). The natural load-balancing methods are different in the two parallel models, the shared-memory method being in principle more efficient for very load-imbalanced problems. It is therefore possible that hybrid parallelism will be beneficial on SMP clusters. We benchmark MPI and OpenMP implementations of the algorithm on MPP, SMP and cluster architectures, and evaluate the effectiveness of hybrid parallelism. Although we observe cases where OpenMP is more efficient than MPI on a single SMP node, we conclude that our current OpenMP implementation is not yet efficient enough for hybrid parallelism to outperform pure message-passing on an SMP cluster.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
A Comparison of Three Programming Models for Adaptive Applications on the Origin2000
Adaptive applications have computational workloads and communication patterns which change unpredictably at runtime, requiring load balancing to achieve scalable performance on parallel machines. Efficient parallel implementation of such adaptive application is therefore a challenging task. In this paper, we compare the performance of and the programming effort required for two major classes of adaptive applications under three leading parallel programming models on an SGI Origin 2000 system, a machine which supports all three models efficiently. Results indicate that the three models deliver comparable performance. However, the implementations differ significantly beyond merely using explicit messages versus implicit loads/stores even though the basic parallel algorithms are similar. Compared with the message-passing (using MPI) and SHMEM programming models, the cache-coherent shared address space (CC-SAS) model provides substantial ease of programming at both the conceptual level and program orchestration levels, often accompanied by performance gains. However, CC-SAS currently has portability limitations and may suffer from poor spatial locality of physically distributed shared data on large numbers of processors.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
MPI versus MPI+OpenMP on IBM SP for the NAS Benchmarks
The hybrid memory model of clusters of multiprocessors raises two issues: programming model and performance. Many parallel programs have been written by using the MPI standard. To evaluate the pertinence of hybrid models for existing MPI codes, we compare a unified model (MPI) and a hybrid one (OpenMP fine grain parallelization after profiling) for the NAS 2.3 benchmarks on two IBM SP systems. The superiority of one model depends on 1) the level of shared memory model parallelization, 2) the communication patterns and 3) the memory access patterns. The relative speeds of the main architecture components (CPU, memory, and network) are of tremendous importance for selecting one model. With the used hybrid model, our results show that a unified MPI approach is better for most of the benchmarks. The hybrid approach becomes better only when fast processors make the communication performance significant and the level of parallelization is sufficient.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
POTPOURRI
A Wrapper Generator for Wrapping High Performance Legacy Codes as Java/CORBA Components
This paper describes a Wrapper Generator for wrapping high performance legacy codes as Java/CORBA components for use in a distributed component-based problem-solving environment. Using the Wrapper Generator we have automatically wrapped an MPI-based legacy code as a single CORBA object, and implemented a problem-solving environment for molecular dynamic simulations. Performance comparisons between runs of the CORBA object and the original legacy code on a cluster of workstations and on a parallel computer are also presented.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
A Scalable SNMP-Based Distributed Monitoring System For Heterogeneous Network Computing
Traditional centralized monitoring systems do not scale to present-day large, complex, network-computing systems. Based on recent SNMP standards for distributed management, this paper addresses the scalability problem through distribution of monitoring tasks, applicable for tools such as SIMONE (SNMP-based monitoring prototype implemented by the authors).
Distribution is achieved by introducing one or more levels of a dual entity called the Intermediate Level Manager (ILM) between a manager and the agents. The ILM accepts monitoring tasks described in the form of scripts and delegated by the next higher entity. The solution is flexible and integratable into a SNMP tool without altering other system components.
A testbed of up to 1024 monitoring elements is used to assess scalability. Noticeable improvements in the round trip delay (from seconds to less than one tenth of a second) were observed when more than 200 monitoring elements are present and as few as two ILM’s are used.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
ESP: A System Utilization Benchmark
This article describes a new benchmark, called the Effective System Performance (ESP) test, which is designed to measure system-level performance, including such factors as job scheduling efficiency, handling of large jobs and shutdown-reboot times. In particular, this test can be used to study the effects of various scheduling policies and parameters. We present here some results that we have obtained so far on the Cray T3E and IBM SP systems, together with insights obtained from simulations.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
CLUSTER INFRASTRUCTURE
PM2: A High Performance Communication Middleware for Heterogeneous Network Environments
This paper introduces a high performance communication middle layer, called PM2, for heterogeneous network environments. PM2 currently supports Myrinet, Ethernet, and SMP. Binary code written in PM2 or written in a communication library, such as MPICH-SCore on top of PM2, may run on any combination of those networks without re-compilation. According to a set of NAS parallel benchmark results, MPICH-SCore performance is better than dedicated communication libraries such as MPICH-BIP/SMP and MPICH-GM when running some benchmark programs.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
Performance and Interoperability Issues in Incorporating Cluster Management Systems Within a Wide-Area Network-Computing Environment
This paper describes the performance and interoperability issues that arise in the process of integrating cluster management systems into a wide-area network-computing environment, and provides solutions in the context of the Purdue University Network Computing Hubs (PUNCH). The described solution provides users with a single point of access to resources spread across administrative domains, and an intelligent translation process makes it possible for users to submit jobs to different types of cluster management systems in a transparent manner. The approach does not require any modifications to the cluster management software. However, call-back and caching capabilities that would improve performance and make such systems more interoperable with wide-area computing systems are discussed.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
Architectural and Performance Evaluation of GigaNet and Myrinet Interconnects on Clusters of Small-Scale SMP Servers
GigaNet and Myrinet are two of the leading interconnects for clusters of commodity computer systems. Both provide memory-protected user-level network interface access, and deliver low-latency and high-bandwidth communication to applications. GigaNet is a connection-oriented interconnect based on a hardware implementation of Virtual Interface (VI) Architecture and Asynchronous Transfer Mode (ATM) technologies. Myrinet is a connection-less interconnect which leverages packet switching technologies from experimental Massively Parallel Processors (MPP) networks. This paper investigates their architectural differences and evaluates their performance on two commodity clusters based on two generations of Symmetric Multiple Processors (SMP) servers. The performance measurements reported here suggest that the implementation of Message Passing Interface (MPI) significantly affects the cluster performance. Although MPICH-GM over Myrinet demonstrates lower latency with small messages, the polling-driven implementation of MPICH-GM often leads to tight synchronization between communication processes and higher CPU overhead.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
QOS/FAULT TOLERANCE
MPICH-GQ: Quality-of-Service for Message Passing Programs
Parallel programmers typically assume that all resources required for a program’s execution are dedicated to that purpose. However, in local and wide area networks, contention for shared networks, CPUs, and I/O systems can result in significant variations in availability, with consequent adverse effects on overall performance. We describe a new message-passing architecture, MPICH-GQ, that uses quality of service (QoS) mechanisms to manage contention and hence improve performance of message passing interface (MPI) applications. MPICH-GQ combines new QoS specification, traffic shaping, QoS reservation, and QoS implementation techniques to deliver QoS capabilities to the high-bandwidth bursty flows, complex structures, and reliable protocols used in high-performance applications-characteristics very different from the low-bandwidth, constant bit-rate media flows and unreliable protocols for which QoS mechanisms were designed. Results obtained on a differentiated services testbed demonstrate our ability to maintain application performance in the face of heavy network contention.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
Scalable Fault-Tolerant Distributed Shared Memory
This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a home-based lazy release consistency (HLRC) DSM system with independent checkpointing and logging to volatile memory, targeting shared-memory computing on very large LAN-based clusters. In these environments, where global coordination may be expensive, independent checkpointing becomes critical to scalability. However, independent checkpointing is only practical if we can control the size of the log and checkpoints in the absence of global coordination. In this paper we describe the design of our fault-tolerant DSM system and present our solutions to the problems of checkpoint and log management. We also present experimental results showing that our fault tolerance support is light-weight, adding only low messaging, logging and checkpointing overheads, and that our management algorithms can be expected to effectively bound the size of the checkpoints and logs for real applications.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
Realizing Fault Resilience in Web-Server Cluster
Today, a successful Internet service is absolutely critical to be up 100 percent of the time. Server clustering is the most promising approach to meet this requirement. However, the existing Web server-clustering solutions merely can provide high availability derived from their redundancy nature, but offer no guarantee about fault resilience for the service. In this paper, we address this problem by implementing an innovative mechanism which enables a Web request to be smoothly migrated and recovered on another working node in the presence of server failure. We will show that request migration and recovery could be efficiently achieved in the manner of user transparency. The achieved capability of fault resilience is important and essential for a variety of critical services (e.g., E-commerce), which are increasingly widespread used. Our approach takes an important step toward providing a highly reliable Web service.
This paper can be found in the ACM and IEEE Digital Libaries ACM IEEE
The post TECHNICAL PAPERS appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>The post STUDENT VOLUNTEERS appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>Student volunteers have the opportunity to see and discuss the latest high-performance networking and computing technology and meet leading researchers from around the world while contributing to the success of this annual event. No special skills or experience are necessary for most volunteer positions.
Read the information below, and then consider if you want to register to become an SC2000 student volunteer!
So, what is the SC2000 conference?
SC2000 is the current event of an annual conference series that focuses on the development and application of high-performance computing and communications technologies.
This year’s conference will be Sunday 05 November thru Friday 10 November 2000 in Dallas.
A student volunteer helps out with the administration of the conference in exchange for free conference registration, housing, most meals, conference goodies, and more. We ask for a total of 20-25 hours of work. You will interact with the conference organizers and presenters, and meet other students from all over the world. It’s an opportunity to get involved with a great conference and reduce expenses at the same time. It’s also not a bad item to have on your resume.
Student volunteers have the opportunity to see and discuss the latest high-performance networking and computing technology and meet leading researchers from around the world while contributing to the success of this annual event. No special skills or experience are necessary for most volunteer positions; however, some familiarity with computing platforms, audio/visual aids, and office equipment is helpful.
Who is eligible to be a student volunteer?
Any full time college student may become a student volunteer at SC2000. Both undergraduate and graduate students are welcome. You will need a recommendation from one of your professors.
High school students are not encouraged to apply, but local (Dallas-Fort Worth area) high school students might be considered under special (as yet undefined) circumstances.
When are student volunteers needed?
Official conference activities will start on Sunday 05 November and last through Friday morning 10 November . This is when most of the student volunteers will be needed.
You will need to arrive at the conference on Saturday afternoon 04 November. We will have our first group meeting Saturday evening.
Conference setup and especially deployment of the conference network takes place the week before the conference begins. A few students with an interest in networking could volunteer for these network deployment activities.
What do I get by being a student volunteer?
The conference provides students with:
Registration to the conference including technical sessions and keynote addresses
Access to the exhibit floor (and enough time to see the exhibits)
Hotel accommodations at one of the conference hotels
Conference proceedings on CD-ROM
Admission to special events on Monday and Thursday evenings
What don’t I get as a student volunteer?
There are some important considerations for you to be aware of.
We do not have a travel budget for student volunteers. You or your college will have to pay travel costs.
There are a few meals that are not provided (Saturday, Sunday, Tuesday, and Wednesday dinner, for example).
What kinds of work will I do as a volunteer?
We use students to distribute and collect evaluation forms in each of the tutorials and technical sessions. You might assist speakers with handouts or A/V equipment. You might help in registration. You might supervise/assist in the e-mail room. You might help with media relations.
Hmmmm! Doesn’t sound like much fun. Why would I want to do this?
Well, you are only committed to 20-25 hours of volunteer duties. The rest of the time you can attend the technical sessions. The real fun is cruising the exhibit floor where hardware and software vendors and many research institutions all have very interesting exhibits and demonstrations. And you can schmooze with thousands of people in the high-performance computing and communications community. Some of them might be interested in hiring you in a year or two.
Do I have a say in what work I do as a volunteer?
Not much, but we’ll do what we can. There are often fewer volunteers than tasks. And unanticipated tasks can pop up anytime. Sometimes we have to scramble. But if you have special skills, we will try to use them.
Do I have to be there for the whole conference to be a volunteer?
No. We realize that other commitments (like school) exist. We hope that students local to the Dallas-Fort Worth area will be able to be flexible. You will need to be able to devote enough time to work 20-25 hours during the conference.
Where will I stay during the conference?
Rooms will be available at each of the conference hotels at no cost to student volunteers. Assignments will be made as we get closer to the conference.
The rooms are all double occupancy. You are encouraged to apply with a classmate (of a similar gender) and plan to share a room.
Are there students with special skills or other characteristics that you encourage to apply?
Glad you asked!
Students with an interest in networking and who can help the week before the conference are encouraged to apply.
We need students who can speak Japanese or Korean to assist at registration.
This year (like last year) the conference is focusing on diversifying the student volunteer population. So, students from under-represented groups are especially encouraged to apply — preferably, two or more from one institution.
International students are also encouraged to apply.
Register on the registration page.
You will need to provide basic information about yourself and why you want to attend SC2000 as a student volunteer. And you need to provide the name and e-address of a professor who will recommend you.
There is another link to the registration page at the top of this document.
When will I find out if you have selected me?
The registration pages will stay open until Friday 08 September. We hope to have decisions made and volunteers notified within a week following the close of registration.
Will I have any other questions?
Unlikely! Well, maybe.
The post STUDENT VOLUNTEERS appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>The post STATE OF THE FIELD TALKS appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>All four talks will be in Ballroom C
The post STATE OF THE FIELD TALKS appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>The post SCINET appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>This year, Scinet 2000 is working with Qwest Communications International, Cisco, Nortel Networks, Juniper, Marconi, Foundry, Extreme, Spirent and others to establish flexible, wide area connections to the show floor. Using Qwest’s fiber infrastructure in the Dallas area and Qwest SONET, ATM and IP backbones nationwide, the wide area network will feature multiple OC-48 (2.5 gigabits per second) connections, several OC-12 connections and possibly other connections, using the very latest technology and protocols. The total connectivity between SC 2000 and the outside world will be 10.5 to 11 gigabits per second-a new record for the SC conferences! In addition to commodity Internet access, WAN connection links to SCinet 2000 will include:
SCinet plans to deploy and support IPv4, IPv6, ATM, and Packet over SONET connections, Myrinet, Quality of Service demonstrations, and advanced network monitoring. Other types of connections might be possible based on discussions with requestors.
SCinet will install and operate more than 40 miles of fiber optics throughout the conference areas, including the types of connections listed below. SCinet 2000 is planning to support an all-ST-terminated, all-fiber show floor network to interconnect booths using switched technologies.
In order to support the complex logistics and requirements of an exhibition on the scale of SC2000, SCinet is deploying four overlapping networks within the Dallas Convention Center. A diagram of the network is below.
They are all interconnected, but can operate independently of each other. At the lowest level, several days before the show starts, SCinet deploys a commodity Internet network to connect show offices, the conference Education Program, and the show’s e-mail facilities. At the next level, there is a production network, provisioned with leading-edge hardware from various vendors. This year, this network will feature Gigabit Ethernet and OC-48 ATM.
The Network Operations Center is also developed from scratch just before the show starts. This year, in addition to the traditional functions of supporting the network equipment, providing a help desk, and work areas for the network engineers, the NOC will also house a variety of displays and information. Spirent is providing their “SmartBits” technology to monitor aspects of SCinet. The SCinet “bit-o-meter” will display aggregate network traffic. Specific applications and events will be monitored throughout the show. Scinet will also use the “Bro” package from LBNL to monitor network traffic for intrusion. Further displays such as from the Netflow package will also be viewable.
One of the most impressive things about SCinet is that every year, it brings together the best network professionals from across the country to help create the entire network and support the multiple aspects of the program. This year, staff from many organizations are supporting SCinet, including:
Aaronsen Group, Argonne National Laboratory (DOE), Army Research Laboratory (DOD), Avici Systems, Caltech, CISCO Systems, the Dallas Convention Center, the Dallas Convention and Visitor’s Bureau, Extreme Networks, Foundry Networks, GST Telecom, Internet2, Juniper Networks, Lawrence Berkeley National Laboratory (DOE), Lawrence Livermore National Laboratory (DOE), Marconi, MCI, MITRE Corporation, National Center Scientific Applications (NSF), Northeast Regional Data Center/ University of Florida, Nichols Research/CSC, Nortel Networks, Oak Ridge National Laboratory (DOE), Oregon State University, Pacific Northwest Laboratory (DOE), Qwest Communications, Sandia National Laboratory (DOE), SBC Communications, Spirent, Texas A&M; University, U.S. Army Engineer Research and Development Center Major Shared Resource Center (DOD), University Corporation for Advanced Internet Development, University of Tennessee/Knoxville, the Very high performance Backbone Network Services – vBNS (NSF)
These people are the critical ingredients who make SCinet work. Each year they attempt to top the previous year as well as give the best possible service and support the show. They worked year around planning and implementing SCinet, and then will spend more than three weeks in Dallas building and running SCinet. Without the people, all the fiber, all the routers and all the infrastructure would not pass one bit of information.
For the first time SCinet will offer tours of the NOC and other equipment for attendees and exhibitors. A sign-up sheet will be available at the NOC for a limited number of tours. It is a chance to see the equipment from behind the scenes and get to talk to some of the vendors and volunteers who put together this most intense network.
Wireless
Working with Cisco Systems, SBC Communications and other vendors, SCinet is creating a large 11 Mbps wireless network on the exhibit floor, in the Education Program area, and other locations throughout the conference space, possibly the entire SC 2000 conference area. This wireless network will support the Education Program and the eSCape2000 activities, among other things.
Wireless connectivity is planned for attendees as well. A standards-based 802.11b network with DHCP service will cover the exhibit floor. Attendees with laptops equipped with standards-compliant wireless Ethernet cards, and an operating system which will configure network services as a client of DHCP should have immediate connectivity. A selection of cards and operating systems known to work are listed on the SCinet web page along with links to vendors, drivers, and instructions. SCinet personnel will not be able to provide direct support to attendees who have trouble connecting.
SCinet will not be providing wireless cards for individual systems. SCinet does not support setup, configuration and/or diagnosis of individual systems, but will provide links to information about these subjects at the web site.
The priority areas supported for wireless are the exhibit areas, education area, convention center lobby, meeting rooms and other spaces. If limits are necessary, we will attempt to indicate range and limits with signage. SCinet discourages people/groups/exhibits from bringing their own base stations, because of issues with base station conflicts. SCinet also reserves the right to disconnect any base station that interferes with the SCinet network.
Xnet
Xnet is the leading edge, technology-development showcase segment of SCinet. Since exhibitors, users and attendees become more and more dependent on SCinet to provide robust, high-performance, production-quality networking, it has become increasingly difficult for SCinet to showcase bleeding edge, potentially fragile technology. Simultaneously, vendors have sometimes been reticent about showcasing bleeding-edge hardware in SCinet as it became a production network.
Xnet provides the solution to this dichotomy. It provides a context which is, by definition, bleeding-edge, pre-standard, and in which fragility goes with the territory. It thus provides vendors an opportunity to showcase network gear or capabilities which typically does not exist in anyone’s off the shelf catalog.
This year, Xnet will demonstrate early, pre-production 10-Gigabit Ethernet equipment connecting several show floor booths.
The post SCINET appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>The post RESEARCH GEMS appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>This year, the Research Gems will have special Open Houses Wednesday, Nov. 8 and Thursday Nov. 9, from 10-11am. Don’t miss this chance to discuss the posters with the authors.
Special Awards
A $250 award will be granted for the submission selected by the review committee as “Best Research Gem of the Conference”. This will be announced at the SC2000 Awards Session.
The post RESEARCH GEMS appeared first on SC 2000: High Performance Networking and Computing Conference.
]]>