Loading…

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Monday, March 4
 

8:30am CST

Workshop: Best Practices in Supercomputing Systems Management
8:30 – 8:40 Keith Gray (BP) Welcome and Introduction
8:40 – 9:00 Rosemary Francis (Ellexus) Benchmarking challenges
9:00 – 9:20 Carlos Rosales-Fernandez (Intel) Optimize for both memory and compute using Roofline automation and SIMD analysis tools
9:20 – 9:40 Ron Cogswell & Alex Morris (Shell) Benchmarking Seismic Code on AMD & Intel CPUs.
9:40 – 10:00 Dave McMillan (Cray) Monitoring – A View into our Systems
Break
10:30 – 10:50 Mike Townsley (Exxon) ExxonMobil’s new Spring Campus HPC Data Center
10:50 – 11:20 Tommy Minyard (TACC) Fontera; challenges & liquid cooling in the data center
11:20 – 11:35 Tommy Minyard (TACC) Storage, NV DIMMS & Apache Pass
11:35 – 11:55 Erik Engquist (Rice) Efficient data transfers
11:55 – 12:00 David Baldwin (Shell) Close

This session will share best practices in supercomputing systems management. In the last few years, our industry has made great progress improving the reliability of very large MPI jobs in our clusters, but the challenge still remains on how to accurately benchmark our algorithms to best squeeze every last piece of performance out of our systems. Coupled with a changing landscape for the filesystems and evolving services supported by cloud, the choices we make as HPC professionals becomes more difficult but no less critical.

We are organizing a workshop to share best practices in filesystems monitoring and management and performance management with a focus on the cluster’s health to drive application performance. Experts and practitioners from industry, academia and national laboratories will present and share their experiences on these subjects as well as leading a discussion.



Monday March 4, 2019 8:30am - 12:00pm CST
Auditorium

8:30am CST

Workshop: Hiring, Developing and Retaining an Inclusive Workforce
This session will share tips and strategies for attracting and increasing diversity in your workforce, from small modifications in job postings, to cultural shifts that encourage employee retention. The oil & gas industry has recognized the need to bring in a more diverse talent pool, to fill the hundreds of positions in advanced computing, machine learning/artificial intelligence, and data science that are opening up in the coming years. Successfully increasing diversity in the workplace requires careful planning and support at all levels.

Texas Women in HPC (TXWHPC) has organized this workshop to share best practices to broaden the pipeline of candidates for HPC-related jobs, and offer suggestions for ways to keep your organization humming smoothly. Experts from academia, the oil & gas and computing industries will present and share their experiences related to hiring and maintaining a more diverse workforce, and the importance of male allies. The panel session will offer an opportunity for the audience to ask questions of the presenters.



Monday March 4, 2019 8:30am - 12:00pm CST
Room 280

12:00pm CST

Conference Registration & Networking
Monday March 4, 2019 12:00pm - 1:00pm CST
Event Hall

1:00pm CST

Welcome

Monday March 4, 2019 1:00pm - 1:15pm CST
Auditorium

1:15pm CST

High Performance Computing and High Performance Data Analytics, and what O&G will do with it in the next 10 years
Speakers
avatar for Detlef Hohl

Detlef Hohl

Chief Scientist Computation and Data Science, Shell
Detlef Hohl holds a Master's degree in chemistry from Technical University of Munich and a PhD in theoretical physics from Technical University of Aachen (Germany). Before joining Shell in 1997, he was senior scientist at the German National Laboratory Forschungszentrum Jülich.Detlef... Read More →


Monday March 4, 2019 1:15pm - 2:00pm CST
Auditorium

2:00pm CST

Evolving System Design for HPC: Next Generation CPU and Accelerator Technologies
As the definition of High Performance Computing (HPC) expands to include Big Data Analytics, Machine Learning, and Artificial Intelligence (AI), a more scalable, powerful, and secure approach is required to meet these ever growing demands. This talk will discuss how next generation CPU and accelerator technologies, partnered with innovative system designs and software advances, can propel the industry to pre-exascale and exascale systems. Unlocking the full potential of these new technologies and architectures will require innovation throughout the ecosystem.

WATCH VIDEO

Speakers
avatar for Forrest Norrod

Forrest Norrod

Senior Vice President & General Manager, AMD
Forrest Norrod is senior vice president and general manager of the Datacenter and Embedded Solutions Business Group at AMD. In this role, he is responsible for managing all aspects of strategy, business management, engineering and sales for AMD datacenter and embedded products. Norrod... Read More →



Monday March 4, 2019 2:00pm - 2:30pm CST
Auditorium

2:30pm CST

Arm in HPC
With the recent Astra system at Sandia Lab (#203 on the Top500) and HPE Catalyst project in the UK, Arm-based architectures are arriving in HPC environments.  Several partners have announced or will soon announce new silicon and projects, each of which offers something different and compelling for our community.  Brent will describe the driving factors and how these solutions are changing the landscape for HPC.

WATCH VIDEO

Speakers
avatar for Brent Gorda

Brent Gorda

Sr. Director, ARM
Brent has a long history of working in supercomputing community. Starting in the mid-80’s in Canada, he wrote compilers for Myrias Research. In the early 90’s he moved to the Lawrence Livermore National Laboratory and worked on the adoption of parallel computing. In the early... Read More →



Monday March 4, 2019 2:30pm - 3:00pm CST
Auditorium

3:00pm CST

Break
Monday March 4, 2019 3:00pm - 3:30pm CST
Event Hall

3:30pm CST

AI & Deep Learning: AI for HPC in Oil & Gas: Experiences and Opportunities
This talk will be focused on how AI techniques can be used in the development of HPC for Oil & Gas ranging from seismic and image processing to environment and tools for reservoir simulations.
Large image dataset can be used to interprete images. Also as larger HPC systems become more and more heterogeneous by adding GPUs and other devices for performance and energy efficiency, they also become more and complex to write & optimize the HPC applications for. For instance, both CPU and GPUs have several types of memories and caches that codes need to be optimized for. We show how AI techniques can help us pick among 10s of thousands of parameters one ends up needing to optimize for the best possible performance of some given complex applications. Ideas for future opportunities will also be discussed.

WATCH VIDEO

Speakers
AC

Anne C. Elster

Norwegian University of Science and Technology



Monday March 4, 2019 3:30pm - 3:50pm CST
Auditorium

3:30pm CST

Data Storage & I/O Performance: Providing Balanced Systems and Expectations with the IO500
For years, high performance computing has been dominated by the overwhelming specter of Linpack and the Top500.  Many sites, tempted by the allure of fleeting Top500 glory, chased architectures well-suited to Linpack but to the detriment of their core workflows.  Despite this, the Top500 has overall provided value to the community by bringing attention to HPC and driving competition and innovation in processor architectures.  Two years ago, with these observations in mind, we formed a comparable list for HPC storage called the IO500.  

The IO500 seeks to provide more balance for HPC.  By creating a complementary list to the Top500, we hope that sites that pursue these lists will design machines that work well for both the Top500 and the IO500 thereby resulting in generally more balanced overall data centers.  Additionally, the IO500 consists of a suite of benchmarks designed to identify a storage system's range of possible performance.  For too long, storage vendors and data centers have only published their "hero" bandwidth numbers which provides a tremendous disservice to the community by creating unreasonable and unattainable performance expectations.  Accordingly, the IO500 forces submitters to report both their "hero" numbers as well as their performance using notoriously challenging patterns of both data and metadata.  This provides the community with an understanding of both a system's possible and its probable potentials.

Over two years, we have now had three lists and collected over sixty submissions across more than twenty institutions and nine different file systems.  All collected data is publicly available such that the community can begin to discover which file systems (and which configurations) will best serve their particular workflow balance.

In this talk, we will present a brief history and motivation of the IO500 and spend the majority of the time attempting to find trends and other observations from the submissions received thus far.

WATCH VIDEO

Speakers


Monday March 4, 2019 3:30pm - 3:50pm CST
Room 280

3:50pm CST

AI & Deep Learning: Zero-Communication Model Parallelism for Distributed Extreme-Scale Deep Learning
Current deep learning architectures are growing larger to learn from complex datasets. Microsoft showed a breakthrough in image recognition with a 152-layer neural network, popularly known as resnet, containing around 60 million parameters. This breakthrough was achieved with a ten times bigger model than the previous best performing Google's LeNet. Last year, Google demonstrated a need for a 137 billion parameter network trained over specialized hardware for language modeling. Quoting a statement from the same paper “such model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora.”

The quest for a unified machine learning algorithm which can simultaneously generalize from diverse sources of information (or transfer learning) has made it imperative to train astronomical sized neural networks with enormous computational and memory overheads. Basic information theoretical argument implies that to assimilate all the information to be able to map inputs to complex and diverse decisions, the capacity of the model cannot be small.

A simple back of envelope calculation shows that for a neural network with 137 billion weights, the model itself will require around 500 GB of memory. Training will need at least around 1.5 TB of working memory if trained with popular algorithms like Adam.

Due to the growing size and complexity of networks, efficient algorithms for training massive deep networks in a distributed and parallel environment is currently the most sought after problem in both academia and the commercial industry. Recently Google released mesh-tensorflow for distributed training of neural networks. In distributed computing environments, the parameters of giant deep networks are required to be split across multiple nodes. However, this setup requires costly communication and synchronization between the parameter server and processing nodes in order to transfer the gradient and parameter updates. The sequential and dense nature of gradient updates prohibits any efficient splitting (sharding) of the neural network parameters across compute nodes. There is no clear way to avoid the costly synchronization without resorting to some ad-hoc breaking of the network. This ad-hoc breaking of deep networks is not well understood and is likely to hurt performance. Synchronization is one of the significant hurdles in scalability.

As a result, model parallelism and parameter sharding in a distributed environment remain one of the major sought after problem in the HPC community.

In this talk, I will show a surprising connection between predicting measurements of a count-min sketch, a popular streaming algorithm for finding frequent items, and predicting the class with maximum probability by a classifier such as deep learning. With this connection, we will show a provable and straightforward randomized algorithm for multi-class classification that requires resources logarithmic in the number of classes. The technique is generic and can decompose any given large network into logarithmic size small networks that can trained independently without any communication. In practice, when applied on industry scale dataset with 100,000 classes, we can obtain around twice the best-reported accuracy in more than 100x reduction in computations and memory. Using the simple idea of hashing, we can train ODP dataset with 100,000 classes and 400,000 features, with the classification accuracy of 19.28% which is the best-reported accuracy on this dataset. Before this work, the best performing baseline is a one-vs-all classifier that requires 40 billion parameters (160 GB model size) and achieves 9% accuracy. All prior works on ODP dataset require significant clusters with days of training which we can achieve in hours on a single machine. This work is an ideal illustration of the power of randomized algorithms for ML, where randomization can reduce computations with increased accuracy due to implicit regularization.

And, to sweeten the deal, the algorithm is embarrassingly parallelizable over GPUs. We get a perfectly linear speedup with an increase in the number of GPUs as the algorithm is provably communication free.  Our experiments show that we can train ODP datasets in 7 hours on a single GPU or in 15 minutes with 25 GPUs. Similarly, we can train classifiers over the fine-grained imagenet dataset in 24 hours on a single GPU which can be reduced to little over 1 hour with 20 GPUs.
We provide the first framework for model parallelism that does not require any communication. Further exploration can revolutionize the field of distributed deep learning with a large number of classes. We hope this method will get adopted.

WATCH VIDEO

Speakers
QH

Qixuan Huang

Rice University
TM

Tharun Medini

Rice University
avatar for Anshumali Shrivastava

Anshumali Shrivastava

Associate Professor Computer Science, Electrical and Computer Engineering, Statistics, and Founder of ThirdAI Corp., Rice University
Anshumali Shrivastava's research focuses on Large Scale Machine Learning, Scalable and Sustainable Deep Learning, Randomized Algorithms for Big-Data and Graph Mining.



Monday March 4, 2019 3:50pm - 4:10pm CST
Auditorium

3:50pm CST

Data Storage & I/O Performance: Solving I/O Slowdown: The "Noisy Neighbor" Problem
In Oil and Gas, when using shared storage, mixed workloads can have a big impact on I/O performance causing considerable slowdown when running small I/O alongside large I/O on the same storage system.  With the increase adoption of Flash, and new features in Lustre, such as Progressive File Layout (PFL) and Data on Metadata (DoM), Storage can be tuned to automatically isolate different workloads to different storage media to accelerate small I/O without disrupting large sequential I/O performance, using the right mix of Flash and Disk transparently.  In this presentation, Cray will share real benchmark results on the impacts of the "Noisy Neighbor" application has on sequential I/O, and with the right storage tuning and flash capacity, how to optimize the storage to meet the demanding workloads of Oil and Gas to accelerate performance for a mixed workload environment.

WATCH VIDEO

Speakers


Monday March 4, 2019 3:50pm - 4:10pm CST
Room 280

4:10pm CST

AI & Deep Learning: Strong Scaling Strategy for Deep Neural Network Seismic Segmentation Models
In this paper, we present a strong scaling approach to run seismic segmentation models in
parallel. The models used here are deep neural networks, which are increasingly being used by
the Oil and Gas industry. The typical approach to run these models in parallel is to increase the
mini-batch sizes as the number of tasks increases, in what we call weak scaling. This strategy
has worked well for massive labeled training sets. However, for moderately large training sets
this may not be the case. Here we present and evaluate a dierent strategy with a seismic
segmentation model with a real dataset using Power machines.

WATCH VIDEO

Speakers
avatar for Eduardo Rodrigues

Eduardo Rodrigues

Research staff, IBM
HPC Researcher at IBM Brazil. My interests include: HPC (of course), Data Analytics, Numerical Simulations, Machine Learning / AI, Cloud Computing.https://scholar.google.com.br/citations?user=EOmaFDMAAAAJ&hl=en



Monday March 4, 2019 4:10pm - 4:30pm CST
Auditorium

4:10pm CST

Data Storage & I/O Performance: Future-Looking Data Storage for Peak Performance at ACCRE’s HPC Environment
High performance computing (HPC) environments continually test the limits of technology and require peak performance from their equipment—including data storage. As the growth of data in HPC explodes, optimizing performance, increasing uptime and lowering costs become vital components to any future-looking HPC storage architecture. Join us to learn how the Advanced Computing Center for Research and Education (ACCRE) at Vanderbilt University enhanced their organizational efficiencies by scaling storage for their networked computer cluster. Housing over 10,000 computational cores, the ACCRE Big Data cluster is used for a wide variety of services, from supporting research across their vast campus to processing complex scientific research from data-intensive environments like CERN to providing tape backup services to multiple external partners. Join our technology expert to discuss Vanderbilt’s ACCRE’s hybrid storage architecture that enables direct end-user access to archival storage and infinite scalability. The presentation also will discuss how ACCRE reduced its annual data storage operating charges for their backup clients by over 50 percent.

WATCH VIDEO

Speakers
DF

David Feller

Spectra Logic
MM

Mariana Menge

Spectra Logic



Monday March 4, 2019 4:10pm - 4:30pm CST
Room 280

4:30pm CST

AI & Deep Learning: Interactive Machine Learning and Inference
The combination of modern software architecture, machine learning engines and scalable computation enables interactive machine learning and inference on massive seismic data, resulting in months of time saved in the quest for energy discovery.  Differently from the highly inefficient approach that is commonly used, we remove the tedious work of converting seismic cubes into small images, instead we enable Machine Learning to dirextly ingest 3D seismic.  This enables geo scientists to directly teach the machine learning engine, instantly correct the inference mistakes, and extract full features from the whole 3D seismic in a matter of minutes (or faster).   This converges data science and geoscience in a way never shown before and it will change the way geoscientist will work in the future.  What we are showing is not research but commercially available technology that is already utilized by a few well known oil companies.  (We will try to have one of the oil companies to co- present).

WATCH VIDEO

Speakers
MI

Michele Isernia

Bluware Corp.



Monday March 4, 2019 4:30pm - 4:50pm CST
Auditorium

4:30pm CST

Data Storage & I/O Performance: Breaking the Memory/IO Wall in Oil & Gas Applications using Approximation Techniques: Mixed Precision and Compression.
n recent years, two trends seem to shape the evolution of HPC: (i) the widening gap between data movement and data processing costs - both in terms of time and energy-  and (ii) the end of the exclusive reign of IEEE 32-bit and 64-bit floating point arithmetics with the advent of AI lower precision requirements. It appears thus pertinent to revisit the legacy applications in order to reduce data movement by using lower precision arithmetic and/or compression at the expense of performing extra computations.

In this talk we explore the impact of such an approach on two representative applications in seismic imaging and digital rock physics (DRP). The first application consists in the reverse time migration, where we reduce expensive I/O operations using a novel compute-bound GPU-resident compression algorithm. Based on the Tucker decomposition used for tensor contractions, our compression algorithm exploits the data sparsity of the 3D domain seismic solution. The second application is a Self Organizing Map (SOM) algorithm, which is a critical phase of a time constrained DRP workflow. We present a GPU-accelerated mixed precision implementation that takes advantage of NVIDIA GPU Tensor Cores.

We report the impact on performance and numerical accuracy for both applications on latest NVIDIA hardware systems.

WATCH VIDEO

Speakers
avatar for David Keyes

David Keyes

Director, Extreme Computing Research Center, KAUST
David Keyes is the director of the Extreme Computing Research Center at King Abdullah University of Science and Technology, where he was a founding dean in 2009, and an adjoint professor of applied mathematics at Columbia University. Keyes earned his BSE in Aerospace and Mechanical... Read More →
HL

Hatem Ltaief

Senior Research Scientist, KAUST
High performance computing Numerical linear algebra Performance optimization



Monday March 4, 2019 4:30pm - 4:50pm CST
Room 280

4:50pm CST

4:50pm CST

Data Storage & I/O Performance: A Generic and Holistic High Performance Distributed Computing and Storage System for Large Datasets in Oil and Gas Industry
In the O&G industry, acquisition of more complicated and larger datasets can happen on the rig sites due to the development of more advanced sensors.  It is now more critical for efficient and effective processing of the acquired data in both real-time and post-processing.  The traditional methods and workflows, such as hosting datasets in a centralized data server, manual processing through a standalone machine, and manual transferring and transforming dataset formats, are inefficient, difficult to manage, and not horizontally scalable.  With distributed clusters and cloud technology, data storage and processing performance can be scale horizontally and optimize via multiple task nodes.  However, it is not trivial to understand how distributed cluster functions and how to integrate applications with different cloud provider solutions.  In this paper, we will introduce and illustrate a generic and holistic high performance distributed computing and storage system for large datasets.  Different domain applications in the O&G industry can utilize this system to accomplish large datasets processing task with instantaneous feedbacks, such as well log interpretation, petrophysics processing, seismic processing, machine learning and deep learning modeling, etc.  The system is composed of a generic distributed processing engine utilizing Apache Spark and Advanced Message Queuing Protocol (AMQP), a generic File I/O service, a distributed data store, and a distributed computing cluster.  The system architecture is very flexible and cloud provider agnostic.  It can be implemented and deployed into either an on-premise, internal cluster or a public cloud network based on the processing requirements needed.

WATCH VIDEO

Speakers
KC

Kristie Chang

Halliburton
LW

Lu Wang

Halliburton


5 Wang pdf

Monday March 4, 2019 4:50pm - 5:10pm CST
Room 280

5:10pm CST

Networking Reception
Monday March 4, 2019 5:10pm - 7:00pm CST
Event Hall
 
Tuesday, March 5
 

7:30am CST

Breakfast & Networking
Tuesday March 5, 2019 7:30am - 8:30am CST
Event Hall

8:30am CST

Welcome

Tuesday March 5, 2019 8:30am - 8:45am CST
Auditorium

8:45am CST

The US Exascale Computing Project: Addressing Extreme-Scale Computing Challenges

Speakers
avatar for Lori Diachin

Lori Diachin

Deputy Director, Exascale Computing Project
Lori Diachin is the Deputy Director for the U.S. Department of Energy’s Exascale Computing Project (ECP). ECP is a multi-billion-dollar Department of Energy effort supported by both the National Nuclear Security Administration and the Office of Science to accelerate the delivery... Read More →



Tuesday March 5, 2019 8:45am - 9:30am CST
Auditorium

9:30am CST

New Era in HPC
In this keynote presentation, Trish will discuss the changing landscape of high performance computing, key trends, and the convergence of HPC-AI-HPDA that is transforming our industry and will fuel HPC to fulfill its potential as a scientific tool for business and innovation.  Trish will highlight not only key forces driving this shift but discuss how this transformation requires a fundamental paradigm shift and is opening up unprecedented opportunities for HPC .

WATCH VIDEO

Speakers
avatar for Trish Damkroger

Trish Damkroger

Vice President, Intel Data Center Group, Intel
Trish Damkroger is Vice President and General Manager of the Technical Computing Initiative (TCI) in Intel’s Data Center Group. She leads Intel’s global Technical Computing business and is responsible for developing and executing Intel’s strategy, building customer relationships... Read More →



Tuesday March 5, 2019 9:30am - 10:00am CST
Auditorium

10:00am CST

Computing for the Endless Frontier
In August of 2018, the Texas Advanced Computing Center (TACC) at the University of Texas at Austin was selected as the sole awardee of the National Science Foundation’s “Towards a Leadership Class Computing Facility” solicitation.   In this talk, I will describe the main components of the award: the Phase 1 system, “Frontera”, which will be the largest University-based supercomputer in the world when it comes online in 2019; the plans for facility operations and scientific support for the next five years; and the plans to design a Phase 2 system in the mid-2020s to be the NSF Leadership system for the latter half of the decade, with capabilities 10x beyond Frontera.  The talk will also cover the growing and shifting nature of the scientific workloads that require advanced capabilities, the technology shifts and challenges the community is currently facing, and the ways TACC has and is restructuring to face these challenges.

WATCH VIDEO

Speakers
avatar for Dan Stanzione

Dan Stanzione

Executive Director, Texas Advanced Computing Center
 Dr. Stanzione is the Executive Director of the Texas Advanced Computing Center (TACC) at The University of Texas at Austin. A nationally recognized leader in high performance computing, Stanzione has served as deputy director since June 2009 and assumed the Executive Director post... Read More →


Tuesday March 5, 2019 10:00am - 10:30am CST
Auditorium

10:30am CST

Break
Tuesday March 5, 2019 10:30am - 11:00am CST
Event Hall

11:00am CST

Software Technology & Applications: Automatic generation of production-grade hybrid MPI-OpenMP parallel wave propogators using Devito
Devito is a Python based domain-specific language for implementing
high-performance finite difference partial differential equation solvers. The
motivating application is exploration seismology where methods such as
Full-Waveform Inversion and Reverse-Time Migration are used to invert terabytes
of seismic data to create images of the earth's subsurface. Even using modern
supercomputers, it can take weeks to process a single seismic survey and create
a useful subsurface image. The computational cost is dominated by the numerical
solution of wave equations and their corresponding adjoints. Therefore, a great
deal of effort is invested in aggressively optimizing the performance of these
wave-equation propagators for different computer architectures. Additionally,
the actual set of partial differential equations being solved and their
numerical discretization is under constant innovation as increasingly realistic
representations of the physics are developed, further ratcheting up the cost of
practical solvers. By embedding a domain-specific language within Python and
making heavy use of SymPy, a symbolic mathematics library, we make it possible
to develop finite difference simulators quickly using a syntax that strongly
resembles the mathematics. The Devito compiler reads this code and applies a
wide range of analysis to generate highly optimized and parallel code. This
approach can reduce the development time of a verified and optimized solver
from months to days.

In this talk we will describe key features of Devito that enable us to achieve
up to 60% of the roofline model for wave solvers. We will also walk through a
3D elastic wave solved to illustrate how the right abstractions can make
dramatically reduce the effort and complexity of PDE solvers. We will also
present, for the first time publicly, results for automatically generated
MPI-OpenMP production grade wave propogation software developed for
applications such as high-frequency RTM.

WATCH VIDEO

Speakers
TB

Tim Burgess

DownUnder Geosolutions
avatar for Gerard Gorman

Gerard Gorman

Imperial College London
Azure Cloud; HPC; modelling and data inversion; DSL's and code generation; education in computational science and engineering.
FL

Fabio Luporini

Imperial College London
avatar for Amik St-Cyr

Amik St-Cyr

Senior researcher, Shell
Amik St-Cyr recently joined the Royal Dutch Shell company as a senior researcher in computation & modeling. Amik came to the industry from the NSF funded National Center for Atmospheric Research (NCAR). His work consisted in the discovery of novel numerical methods for geophysical... Read More →



Tuesday March 5, 2019 11:00am - 11:20am CST
Room 280

11:00am CST

Systems & Facilities: CORAL System Update
IBM has completed delivery of the largest HPC and AI machines as part of the CORAL project with the Summit and Sierra installations.

We will provide a brief review of the original CORAL objectives, how the Summit/Sierra systems were projected to perform and current measured performance and capabilities of the production systems.
Driven by the high efficiencies demanded by large supercomputer machines and the slowdown of Moore’s law of scaling, the CORAL systems address the design challenges with a heterogenous approach. Applications have a diverse set of needs that can be addressed with strong CPUs for Analytics capabilities and powerful GPUs for massively parallel sections. Examples of analytics capabilities well suited to CPUs are exemplified by complex codes with data dependent paths, lots of indirection and pointer chasing, and dependency on latency of the memory subsystem, as found in Oil and Gas Reservoir simulation, AI and Graph Analytics. Examples of massively parallel compute which are well suited to GPUs are exemplified by simple kernels, dense FP operations, and simple data access patterns as found in Oil and Gas Seismic simulation, financial value at risk and image analytics. Tying diverse compute engines together through a high bandwidth interconnect and state of the art memory systems was key in meeting application needs.

IBM created our Data Centric Systems approach to facilitate the design of CORAL to address these demands with four guiding principles: 1) Minimize data motion, 2) Enable compute at all levels of the system hierarchy, 3) Modularity of system components, 4) Application-driven design. With OpenPOWER, IBM created an environment for modulatory with its partners: NVIDIA supplied the GPU for highly parallel work, and Mellanox provided the high-speed interconnect for the system.
Several other key components and their current contributions to CORAL and how they are enabling unique capabilities are the parallel file system in use (IBM Spectrum Scale) and AI technology provided by PowerAI.

IBM will discuss how CORAL is contributing to the transformation of HPC and AI with a view on the types of science work being performed, to include some of the novel ideas IBM Research is pursuing around Intelligent Simulation and how we collaborate with the industry.
IBM will provide a point-of-view on any then current, public information on planned/expected projects beyond CORAL, planned IBM technologies and outline some of the challenges of future Exascale systems.

WATCH VIDEO

Speakers


Tuesday March 5, 2019 11:00am - 11:20am CST
Auditorium

11:20am CST

Software Technology & Applications: An Overview of RAJA
With the rapid change of computing architectures, and variety of programming models, the
ability to develop performance portable applications has become of great importance. This is
particularly true in large production codes where developing and maintaining hardware specific
versions is untenable. In this talk, we provide an overview of RAJA, a C++ library that enables
application developers to write single-source code that can target multiple hardware and
programming model back-ends. RAJA has demonstrated its ability to simplify the development
of performance portable code through its adoption across a diverse set of production applications
at Lawrence Livermore National Laboratory. RAJA decouples loop bodies and execution via
programming model-specific implementations using standard C++11 features. This approach
enables developers to tune loop patterns, rather than individual loops, and enables applications to
be tailored at compile time to specific compute architectures. Furthermore, RAJA provides
abstractions for a wide range of loop structures found in numerical algorithms, reductions, data
layouts and views, iteration spaces, atomic operations, scans, etc. In addition to discussing
features, we discuss experiences integrating RAJA into production codes, performance, and
development as driven by the needs of applications. This work was performed under the auspices
of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344. LLNL-ABS-765138.

WATCH VIDEO

Speakers
AV

Arturo Vargas

Lawrence Livermore National Lab



Tuesday March 5, 2019 11:20am - 11:40am CST
Room 280

11:20am CST

Systems & Facilities: Perlmutter - a 2020 pre-exascale GPU-accelerated system for NERSC.
The National Energy Research Scientific Computing (NERSC) Center is the mission High Performance Computing facility for the U.S. Department of Energy's (DOE) Office of Science (SC). NERSC provides large-scale, state-of-the-art computing, storage, and networking for more than 7,000 users that encompass the DOE’s unclassified research programs in many science areas. In this talk we will describe the Perlmutter machine which will be delivered to NERSC/LBNL in 2020. Perlmutter will be a pre-exascale machine and contain a mixture of AMD EPYC CPU-only and NVIDIA Tesla GPU-accelerated nodes. In this talk we will describe the architecture of the machine and the analysis we performed in order to optimize this design to meet the needs of the ECP and NERSC user communities.

WATCH VIDEO

Speakers


Tuesday March 5, 2019 11:20am - 11:40am CST
Auditorium

11:40am CST

Software Technology & Applications: Software Technology & Applications: Seismic modeling on Arm architectures
The production of reliable three-dimensional images of the subsurface remains a major challenge in the oil and gas industry. Consequently, significant efforts have been devoted to improving seismic exploration.  To date, computational workflows based on Reverse Time Migration (RTM) or Full Wave Inversion (FWI) methodologies are used on Petascale systems in major companies.  
 
However, as each vendor is working on next-generation Exascale technologies, the landscape of architectures that may be available leads to increasing concerns regarding real applicative performance. Whatever the design of these systems will be (heterogeneity, high core counts or depth of the memory hierarchy), it is admitted that co-design approaches will play a major role to ensure that oil and gas applications will be in best position to adopt the next breakthroughs.
 
Indeed, the criticality of seismic imaging algorithms make their efficiency on HPC architectures a continuous challenge. For instance, recent efforts have been devoted to characterizing and reducing the memory-traffic for major numerical kernels. This is from the inherent memory-bound nature of such applications even when high-order approximations are implemented.  So far, and despite significant improvements (e.g. high-level frameworks) only hand-tuned implementations have been able to come close to the theoretical peak performance.
 
Recently, Arm processors have gained traction in the HPC community with both new hardware-level features as well as a comprehensive software ecosystem that may enable a wider range of optimizations for seismic applications. This may be particularly true with the upcoming Scalable Vector Extension (SVE).
 
In this presentation and based on popular examples from the geophysical community (isotropic finite-differences kernels, high-performance stencil frameworks, full-fledged applications) results on Arm systems will be presented and the impact of key features will be discussed.

Speakers

Tuesday March 5, 2019 11:40am - 12:00pm CST
Room 280

11:40am CST

12:00pm CST

Software Technology & Applications: Swift and Parsl: Parallel Scripting for HPC Workflow in Science and Engineering
WATCH VIDEO

Scientists, engineers, and data analysts frequently find themselves
needing to execute a set of application tasks hundreds — or even
millions — of times, for example to optimize a design being simulated,
or to process large collections of data objects such as in oil and gas
exploration. Such activities can be intellectually and
administratively arduous, due to the need to orchestrate many large
data transfer and application execution tasks, and to track the
resulting outputs, which themselves often serve as inputs to
subsequent application stages. Further complicating these activities
is the frequent need to leverage diverse and distributed and parallel
computing resources in order to complete computations in a timely
manner.

The Swift parallel scripting language (which has no relation to the
Apple Swift language) is used by scientists and engineers to express
and run such large or complex parallel invocation patterns
(“workflows”) of programs or functions on parallel systems.  The need
for its scripted workflow automation is pervasive across a wide
spectrum of science and engineering disciplines, and is a hallmark of
HPC for oil and gas applications.

Significant user demand for Swift has been demonstrated in a growing
set of academic and industrial domains, with several hundred users
running on workstations, cluster, clouds, and the largest
supercomputers. Swift has proven its ability to address the needs of
applications from astronomy and bioinformatics to earth systems
science, geology and zoology.

We will survey in this talk the history and evolution of the Swift
parallel scripting/programming model, and its three implementation in
portable, extreme-scale, and Python-based forms.

We will report on the status of the more recent Swift/E (“ecosystem”)
project, which enhances Swift’s power via integration with widely used
scientific software elements: Globus for data management,
collaboration and usability; Jupyter for interactive parallel/
distributed workflow; Python and R for productivity; and git for
maintainability.

We will conclude with a description of the structure and applications
of Parallel Works, a commercial software-as-a-service implementation
of Swift that supports both elastic cloud-based execution of Swift
workflows and on-premise and hybrid/burstable operations.

While a relative newcomer to the field of oil and gas HPC, Swift has
been applied in a wide range of disciplines relevant to the main areas
of O&G computational science, which collectively cover the breadth of
computational workflow patterns that comprise O&G HPC applications.

We will conclude by surveying a set of O&G-relevant Swift applications
in materials science, large-scale image dataset analysis,
oceanography, geospatial earth systems science, and combustion engine
and fuel chemistry, and machine learning and uncertainty
quantification.

Further details are provided on Swift, below, to which demonstrate
its relevance to the O&G HPC domain.

The Swift parallel scripting language and its Python-based successor,
"Parsl", the parallel scripting library, represent a unique approach
to the problem of rapidly and productively integrating existing
application codes (both libraries and application programs) into
scalable and complete higher-level workflow applications.

Like other scripting languages, Swift allows programmers to express
computations via the linking together of existing application code by,
for example, specifying that the output of program A be provided as
input to tasks B and C, the outputs of which are then consumed by task
D. This approach has the advantages of allowing for rapid application
development and avoiding the need to modify existing programs. Swift
supports concurrency implicitly, so that in our example, if tasks B
and C have no other dependencies, they can both execute in parallel as
soon as A completes. This model is quite general: Swift is not limited
to directed acyclic graph (DAG) dependency expressions.

Additionally, Swift introduces a powerful data model that allows for
typical scalars (integers, floats, strings), arrays, structs, and so
on. Swift also supports an unformatted byte array (called a blob for
“binary large object”), which can hold arbitrary native data for
messaging from one task to the next. Furthermore, Swift represents
external files as variables, which can also be the subject of data
dependent operation (similar to Makefiles). These features together
can reduce greatly the costs of developing and running complex and/or
very large-scale workflows.

The original implementation of Swift (called Swift/K because it is
based on a portable Java-based runtime system called Karajan) was
designed for coordination of large-scale distributed computations that
make use of multiple autonomous computing resources distributed over
varied administrative domains, such as clusters, clouds, and
grids. Swift/K focused on reliability and interoperability with many
systems at the expense of performance: execution of the program logic
is confined to a single shared-memory master node, with calls to
external executable applications dispatched to execution resources as
parallel tasks over a highly-distributed agent-based execution
provider framework.

With its fast execution provider executing tasks on a local cluster,
approximately 500–1000 tasks can be dispatched per second by
Swift/K. This rate is more than ample for many command-line
application workflows, but insufficient for applications with more
demanding performance needs such as a high degree of parallelism or
very short task duration.

Optimizations to the language interpreter, network protocols, and
other components could increase throughput, but Swift/K's
single-master architecture ultimately limits scaling and is unsuitable
for applications with tasks with durations of hundreds of milliseconds
or less or with a high degree of parallelism (more than several
thousand parallel tasks).

In order to address the needs of many demanding parallel applications,
a second-generation Swift implementation was developed under the DOE
X-Stack (exascale stack) program.  This version was called Swift/T,
because it is based on a runtime system called Turbine in turn based
on ADLB, the asynchronous, dynamic MPI-based load-balancer. Swift/T
achieves extreme-scale workflow performance on petascale and
exascale-precursor systems (100K to 1M+ cores) by parallelizing and
distributing script execution and task management across many nodes.

Swift focuses on enabling a hierarchical programming model for
high-performance fine-grained task parallelism, orchestrating
large-scale computations composed of external functions with in-memory
data, computational kernels on GPUs and other accelerators, and
parallel functions implemented in lower-level parallel programming
models—typically threads or message-passing. These functions and
kernels are integrated into the Swift language as typed leaf functions
that encapsulate computationally intensive code, leaving parallel
coordination, task distribution, and data dependency management to the
Swift implementation.

Swift can be rigorously analyzed and enhanced by a range of compiler
optimization techniques to achieve high efficiency and scalability for
a broad range of applications on massively parallel distributed-memory
computers. Its design is motivated by the limitations of current
programming models for programming extreme-scale systems and
addressing emerging problems such as programmability for nonexpert
parallel programmers, abstraction of heterogeneous compute resources,
and the composition of

Speakers
YB

Yadu Babuji

University of chicago
KC

Kyle Chard

University of Chicago
BC

Benjamin Clifford

University of Chicago
avatar for Ian Foster

Ian Foster

Argonne National Laboratory
Use Case: Materials Data Facility (Co-Lead PI)Research Interests: Distributed, parallel, and data-intensive computing technologies.
SG

Stefan Gary

Scottish Association of Marine Science
avatar for Daniel S. Katz

Daniel S. Katz

Chief Scientist, NCSA, University of Illinois at Urbana-Champaign
Dan is Chief Scientist at the National Center for Supercomputing Applications (NCSA) and Research Associate Professor in Computer Science, Electrical and Computer Engineering, and the School of Information Sciences (iSchool), at the University of Illinois Urbana-Champaign. In past... Read More →
ZL

Zhuozhao Li

University of Chicago
MM

Marmar Mehrabadi

Parallel Works Inc.
MS

Matthew Shaxted

Parallel Works Inc.
AV

Alvaro Vidal Torreira

Parallel Works Inc.
MW

Michael Wilde

CEO, Parallel Works Inc, University of Chicago, and Argonne National Laboratory (on leave)
HPC Workflow - scaling, automation, productivity, democratization
AW

Anna Woodward

University of Chicago
JW

Justin Wozniak

Argonne National Laboratory and University of Chicago



Tuesday March 5, 2019 12:00pm - 12:20pm CST
Room 280

12:00pm CST

Systems & Facilities: Moving Seismic Migration Into Cloud
As cloud computing keeps maturing over the past several years, the HPC community has started investigating cloud capabilities and the efforts required to create new business opportunities and value. In this talk we share our experiences of migrating a highly parallelized seismic imaging application into external public clouds. In the first phase of the project design choices faced were infrastructure related, such as building virtual network in the cloud and selecting a high-performance storage. When the application successfully ran in the cloud, however, we needed to revisit certain HPC application design decisions because 1) the performances of the CPUs, network and storage in the clouds were different from those in the on-premise clusters; 2) large amount of computing resources became available in the cloud.  Therefore, the main goal of the second phase of our project was to identify the new performance bottlenecks of the application so as to optimize its performance in the cloud. In our talk we are going to show the performance analysis, the design decision made, and the decision criteria selected.

WATCH VIDEO

Speakers


Tuesday March 5, 2019 12:00pm - 12:20pm CST
Auditorium

12:20pm CST

Software Technology & Applications: Python Workflows on HPC Systems: Pitfalls and Best Practice
Python is on the rise. Driven by the vastly growing number of machine learning and data analytics applications, Python has become one of the most popular and widely used programing languages. Even though, the HPC community is traditionally rather sceptical regarding the usage of interpreted script languages, growing user demands are hard to ignore - especially in the context of the current machine learning hype.

Fig. [1]: Projection of questions posted on Stackoverflow, by programming language.

In this talk, we will discuss some of the main challenges providing Python services on HPC systems at production level. This includes:

- security issues
- containment and control of Python processes in multi-user environments
- maintenance of the Python software stack
- GPU integration for Python machine learning workflows
- solutions for interactive workflows.

Towards a practical solution of these issues, we will focus on the containerization of Python environments and their integration into running HPC systems.

About the Authors:

- Dominik Straßel is the senior operator of Fraunhofers GPU-cluster and core developer of Fraunhofer’s Python software stack.
- Philipp Reusch is currently working on his masters thesis at ITWM. One aspect of his work is the optimization of the Python software stack on HPC systems.
- Janis Keuper leads the “Large Scale Machine Learning” Group at ITWM. His research is focusing at ML systems and HPC scale ML algorithms.

[1] Figure by hacckernoon.com, https://hackernoon.com/top-3-most-popular-programming-languages-in-2018-and-their-annual-salaries-51b4a7354e06

WATCH VIDEO

Speakers
JK

Janis Keuper

Fraunhofer
PR

Philipp Reusch

Fraunhofer



Tuesday March 5, 2019 12:20pm - 12:40pm CST
Room 280

12:20pm CST

12:40pm CST

Lunch & Networking
Tuesday March 5, 2019 12:40pm - 1:40pm CST
Event Hall

1:45pm CST

Democratising Access to HPC
The use of HPC and parallel computing is currently undergoing a period of rapid expansion, with increasing diversity in the fields and applications that can leverage these resources. However, the benefits are not always equally shared. As the under-representation of women and minorities is increasingly recognised as a challenge that the entire supercomputing industry faces, the benefits of improving diversity and inclusion in the community, the ‘diversity dividend’ are also being recognised.

As a community we are only just beginning to measure and understand how ‘leaky’ our pipeline is, but attrition rates are likely as high as the general tech community: 41% of women working in tech eventually leave the field (compared to just 17% of men).

This session will discuss the benefits of diversity but also how we can further ‘democratise’ access. Discussion will include the work being carried out by Women in HPC to diversify HPC and innovations in on-boarding in the form of new approaches to HPC training for both the traditional learners and skill development in the workplace.

WATCH VIDEO

Speakers
avatar for Toni Collis

Toni Collis

CBDO/Chair & Co-Founder, Appentra & Women in HPC
Dr Toni Collis is Chair and Co-Founder of Women in High Performance Computing (WHPC) as well as the CBDO for Appentra Solutions and Director of Collis-Holmes Innovations. Having previously worked in EPCC, The University of Edinburgh Supercomputing Centre, Toni has spent a large part... Read More →



Tuesday March 5, 2019 1:45pm - 2:15pm CST
Auditorium

2:15pm CST

Early Career Panel
This panel, moderated by Andrew Jones, includes early career panel members. The questions will be set based on the conference themes.  This gives an opportunity for younger people to get valuable career experience and exposure, for the audience to hear from new and diverse ideas, and to stimulate the community discussion of HPC opportunities in O&G from a fresh viewpoint.

WATCH VIDEO

Moderators
Speakers

Tuesday March 5, 2019 2:15pm - 3:15pm CST
Auditorium

3:15pm CST

Post-K: A Game Changing Supercomputer for Convergence of HPC and Big Data/AI
With rapid rise and increase of Big Data and AI as a new breed of high-performance workloads on supercomputers, we need to accommodate them at scale, and thus the need for R&D for HW and SW Infrastructures where traditional simulation-based HPC and BD/AI would converge, in a BYTES-oriented fashion.  Post-K is the flagship next generation national supercomputer being developed by Riken and Fujitsu in collaboration. Post-K will havehyperscale class resouce in one exascale machine, with well more than 100,000 nodes of sever-class A64fx many-core Arm CPUs, realized through  extensive co-design process involving the entire Japanese HPC community.

Rather than to focus on double precision flops that are of lesser utility, rather Post-K, especially its Arm64fx processor and the Tofu-D network is designed to sustain extreme bandwidth on realistic
applicaitons including those for oil and gas, such as seismic wave propagtion, CFD, as well as structural codes, besting its rivals by several factors in measured performance. Post-K is slated to perform 100 times faster on some key applications c.f. its predecessor, the K-Computer, but also will likely to be the premier big data and AI/ML infrastructure. Currently, we are conducting research to scale deep learning to more than 100,000 nodes on Post-K, where we would obtain near top GPU-class performance on each node.

WATCH VIDEO

Speakers
avatar for Satoshi Matsuoka

Satoshi Matsuoka

Director, Riken Center for Computational Science, Japan
Satoshi Matsuoka from April 2018 has become the director of Riken R-CCS, the top-tier HPC center that represents HPC in Japan, currently hosting the K Computer and developing the next generation Arm-based "exascale" Post-K machine, along with multitudes of ongoing cutting edge HPC... Read More →


Tuesday March 5, 2019 3:15pm - 4:00pm CST
Auditorium

3:59pm CST

Poster and Closing Networking Session
Tuesday March 5, 2019 3:59pm - 6:00pm CST
Event Hall

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

Deep Global Model Reduction Learning
Speakers

Tuesday March 5, 2019 4:00pm - 6:00pm CST
Event Hall

4:00pm CST

Developing and Integrating New Solvers for OPM
Speakers
AC

Anne C. Elster

Norwegian University of Science and Technology


Tuesday March 5, 2019 4:00pm - 6:00pm CST
Event Hall

4:00pm CST

4:00pm CST

4:00pm CST

Exascale archiving

Tuesday March 5, 2019 4:00pm - 6:00pm CST
Event Hall

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

4:00pm CST

 
Wednesday, March 6
 

8:30am CST

Workshop: Singularity Containers: Secure, Repeatable and Mobile Runtimes for Oil and Gas Applications and Workflows
Containers are the means through which application runtimes can be fully encapsulated for mobility and reproducibility. Of primary importance to those in the oil and gas industry, by emphasizing integration over isolation, Singularity containers do not introduce additional degrees of complexity and overhead when it comes to making use of special-purpose devices (e.g., GPUs, fabrics). By inheriting permissions native to the Linux operating environment, Singularity containers also avoid security pitfalls that plague other implementations.

In this workshop, the purpose is to directly introduce participants to Singularity containers in as hands-on a fashion as is feasible. Armed with a functioning environment for running Singularity containers, participants will make use of Docker as well as native Singularity containers (i.e., Singularity Image Format (SIF) files) via remotely hosted repositories in the cloud, namely the Docker Hub (https://hub.docker.com/) as well as the Sylabs Cloud (https://cloud.sylabs.io/library), respectively. The workshop will close with considerations such as definition files for detailing a container’s build instructions, signing containers with verifiable keys, deployment considerations, plus additional topics as time permits.


Speakers
IL

Ian Lumb

Sylabs



Wednesday March 6, 2019 8:30am - 12:00pm CST
Room 280

8:30am CST

Workshop: Speeding Up the Parallelisation Process: An Algorithmic Approach to Multicore and GPU Programming
Designed for both learners and HPC educators HPC educators, this session will cover a new approach to learning a new approach that can product multi-core and GPU enabled code in the space of hours or days rather than weeks or months.

By focusing on a methodological, step-by-step approach to identifying how and where to parallelise, your software can start taking advantage of threads and/or GPU cores in a short amount of development time. This session will focus on how to identify the parallel patterns already present in code, and identification of the data scoping requirements to enable correct OpenMP or OpenACC code. This procedure works for both OpenMP and OpenACC, enabling participants to target the desired architecture quickly and effectively. The benefits can then be applied to any software project your work on in the future.


Speakers
avatar for Toni Collis

Toni Collis

CBDO/Chair & Co-Founder, Appentra & Women in HPC
Dr Toni Collis is Chair and Co-Founder of Women in High Performance Computing (WHPC) as well as the CBDO for Appentra Solutions and Director of Collis-Holmes Innovations. Having previously worked in EPCC, The University of Edinburgh Supercomputing Centre, Toni has spent a large part... Read More →


Wednesday March 6, 2019 8:30am - 12:00pm CST
Room 1003