June 08

Add to Calendar 2020-06-08 14:00:00 2020-06-08 15:00:00 America/New_York Solving Global Grand Challenges with High Performance Data Analytics Abstract:Data science aims to solve grand global challenges such as: detecting and preventing disease in human populations; revealing community structure in large social networks; protecting our elections from cyber-threats, and improving the resilience of the electric power grid. Unlike traditional applications in computational science and engineering, solving these social problems at scale often raises new challenges because of the sparsity and lack of locality in the data, the need for research on scalable algorithms and architectures, and development of frameworks for solving these real-world problems on high performance computers, and for improved models that capture the noise and bias inherent in the torrential data streams. In this talk, Bader will discuss the opportunities and challenges in massive data science for applications in social sciences, physical sciences, and engineering.Biography:David A. Bader is a Distinguished Professor in the Department of Computer Science at New Jersey Institute of Technology. Prior to this, he served as founding Professor and Chair of the School of Computational Science and Engineering, College of Computing, at Georgia Institute of Technology. He is a Fellow of the IEEE, AAAS, and SIAM, and advises the White House, most recently on the National Strategic Computing Initiative (NSCI). Bader serves on the leadership team of Northeast Big Data Innovation Hub as the inaugural chair of the Seed Fund Steering Committee. Dr. Bader is a leading expert in solving global grand challenges in science, engineering, computing, and data science. His interests are at the intersection of high-performance computing and real-world applications, including cybersecurity, massive-scale analytics, and computational genomics, and he has co-authored over 250 scholarly papers. Dr. Bader has served as a lead scientist in several DARPA programs including High Productivity Computing Systems (HPCS) with IBM, Ubiquitous High Performance Computing (UHPC) with NVIDIA, Anomaly Detection at Multiple Scales (ADAMS), Power Efficiency Revolution For Embedded Computing Technologies (PERFECT), Hierarchical Identify Verify Exploit (HIVE), and Software-Defined Hardware (SDH). Bader is Editor-in-Chief of the ACM Transactions on Parallel Computing, and will serve as General Co-Chair of IPDPS 2021. He has also served as Director of the Sony-Toshiba-IBM Center of Competence for the Cell Broadband Engine Processor. Bader is a cofounder of the Graph500 List for benchmarking "Big Data" computing platforms. Bader is recognized as a "RockStar" of High Performance Computing by InsideHPC and as HPCwire's People to Watch in 2012 and 2014. Recently, Bader received an NVIDIA AI Lab (NVAIL) award (2019), and a Facebook Research AI Hardware/Software Co-Design award (2019). https://mit.zoom.us/j/536883569 Password 008943

June 01

Add to Calendar 2020-06-01 14:00:00 2020-06-01 15:00:00 America/New_York Dynamic Data Structures on the GPU Abstract: Dynamic data structures allow updates to the data structure without having to rebuild it completely. I'll discuss five dynamic GPU data structures that we designed and built - log-structured merge trees, quotient filters, linked lists and hash tables built atop them, dynamic graphs, and B-trees. I'll talk about principles that we followed in building them and what we learned from the experience.Bio: John Owens is a professor of electrical and computer engineering at the University of California, Davis, where he leads an amazing group of graduate students in a research program centered around parallel computing on the graphics processing unit (GPU). He earned his PhD from Stanford in 2003 and his BS from UC Berkeley in 1995. In January 2020 he bought his kids MIT sweatshirts from the Coop while helping to run the Mystery Hunt; his kids wear them often. https://mit.zoom.us/j/536883569 Password 008943

May 11

Add to Calendar 2020-05-11 14:00:00 2020-05-11 15:00:00 America/New_York Program Optimization for Machine Learning Abstract: Training deep neural networks (DNNs) can be expensive and slow, consuming enormous numbers of compute-hours on parallel machines. This talk will present results on using novel search procedures over programs to reduce training time. In particular, instead of greedily applying program-improving transformations to compute a single improved program, we search a space of programs, considering many possible candidates guided by a global cost function. Application of search-based optimization to two separate problems will be discussed: improving the partitioning and distribution of training data, and reducing the execution time of the DNN computation graph. Both methods speedup training by up to a factor of 3 over current state-of-the-art systems.Bio: Alex Aiken is the Alcatel-Lucent Professor of Computer Science at Stanford. Alex received his Bachelors degree in Computer Science and Music from Bowling Green State University in 1983 and his Ph.D. from Cornell University in 1988. Alex was a Research Staff Member at the IBM Almaden Research Center (1988-1993) and a Professor in the EECS department at UC Berkeley (1993-2003) before joining the Stanford faculty in 2003. His research interest is in areas related to programming languages. He is an ACM Fellow, a recipient of ACM SIGPLAN's Programming Languages Achievement Award and Phi Beta Kappa's Teaching Award, and a former chair of the Stanford Computer Science Department. https://mit.zoom.us/j/536883569 Password 008943

May 04

Add to Calendar 2020-05-04 14:00:00 2020-05-04 15:00:00 America/New_York Seq: a high-performance language for bioinformatics Abstract: The scope and scale of biological data are increasing at an exponential rate, as technologies like next-generation sequencing are becoming radically cheaper and more prevalent. Over the last two decades, the cost of sequencing a genome has dropped from $100 million to nearly $100-a factor of over 10^6-and the amount of data to be analyzed has increased proportionally, necessitating high-performance tools and methods in order to keep pace. Here we introduce Seq, a high-performance, Pythonic language for bioinformatics and computational genomics, which bridges the gap between the performance of low-level languages like C and C++, and the ease-of-use of high-level languages like Python. The Seq compiler employs numerous domain-specific optimizations to often attain even better performance than hand-optimized implementations of many important algorithms, which we discuss and evaluate.Bio: I'm a graduate student at MIT CSAIL focusing on computational genomics, working with Prof. Bonnie Berger and Prof. Saman Amarasinghe. More specifically, my graduate research involves developing fast, accurate and easy-to-use algorithms and software for processing the ever-increasing genomic data that is being produced. I focus mainly on third-generation sequencing data, and applications pertaining to it like sequence alignment, assembly, genotyping and phasing. Zoom https://mit.zoom.us/j/536883569 Password 008943

April 20

Add to Calendar 2020-04-20 14:00:00 2020-04-20 15:00:00 America/New_York How fast are Algorithms Improving? Abstract: Quantitative analyses of how fast algorithms are improving are usually done ad-hoc. Longer term analyses are quite rare - in fact, almost all use the single example of Linear Solvers. In this paper, we gather data on 127 different algorithm families from more than 40 textbooks and 1000 research papers to present the first comprehensive look at algorithm progress ever assembled. We find that as a whole algorithms have undergone substantial progress, but with huge variation: in some cases, progress rivals those from other important sources (e.g. Moore's Law for computer hardware), but in other cases it shows relatively little progress.Neil Thompson Bio: I am an Innovation Scholar at MIT's Computer Science and Artificial Intelligence Lab and the Initiative on the Digital Economy.I am also an Associate Member of the Broad Institute. Previously, I was an Assistant Professor of Innovation and Strategy at the MIT Sloan School of Management, where I co-directed the Experimental Innovation Lab (X-Lab), and a Visiting Professor at the Laboratory for Innovation Science at Harvard. I have advised businesses and government on the future of Moore's Law and have been on National Academies panels on transformational technologies and scientific reliability.I did my PhD in Business and Public Policy at Berkeley, where I also did Masters degrees in Computer Science and Statistics. I have a masters in Economics from the London School of Economics, and undergraduate degrees in Physics and International Development. Prior to academia, I worked at organizations such as Lawrence Livermore National Laboratories, Bain and Company, The United Nations, the World Bank, and the Canadian Parliament.Yash Sherry Bio: Yash Sherry is a Visiting Student/Research Assistant at the SuperTech Lab, MIT CSAIL. He works under the guidance of Dr. Thompson on Algorithm Wiki and Quantum algorithmic analysis. He is currently a CS graduate student at GeorgiaTech where he is specializing in Machine Learning. Prior to this, he has worked on research projects with Stanford and CMU. He has also been a part of Irisys Lab (Korea), Mozilla and Microsoft in different roles. Currently, he is working on a tech startup in Boston that provides software services to student startups. Zoom (https://mit.zoom.us/j/536883569) Password 008943

March 16

February 18

Add to Calendar 2020-02-18 14:00:00 2020-02-18 15:00:00 America/New_York Safe Parallel Programming -- ParaSail, Ada 202X, OpenMP, and Rust Abstract: Is it possible to bring support for safe parallel programming to modern programming languages, where the compiler does the work of detecting possible data races while the programmer simply identifies where in their program they would like to take advantage of multicore/manycore hardware capabilities? This talk will introduce the programming language ParaSail (Parallel Specification and Implementation Language) which provides safe parallel programming in the context of a pointer-free, object-oriented, strongly-typed modern programming language. The talk will also include a discussion of other recent work to bring compile-time safety to parallel programming, including the upcoming 202X version of the Ada programming language, the OpenMP multiplatform, multilanguage API for parallel programming, and Rust, a language that from the beginning tried to provide safe concurrent programming, and more recently has provided a safe light-weight parallelism library called Rayon.See https://arxiv.org/ftp/arxiv/papers/1902/1902.00525.pdf for more information on ParaSail.Bio: S. Tucker Taft has been working on programming language design for his entire career. Tucker is currently VP and Director of Language Research at AdaCore. Tucker led the Ada 9X language design team, culminating in the February 1995 approval of Ada 95 as the first ISO standardized object-oriented programming language. His specialties include programming language design, advanced static analysis tools, formal methods, real-time systems, parallel programming, and model-based development. Tucker is a member of the ISO Rapporteur Group that developed Ada 2005 and Ada 2012. Tucker has also been designing and implementing a parallel programming language called "ParaSail," and defining parallel programming extensions for Ada as part of the forthcoming Ada 202X standard. Prior to joining AdaCore, Tucker was Founder and CTO of SofCheck, Inc., providing tools and technology to enhance software development quality and productivity. Prior to that Mr. Taft was a Chief Scientist at Intermetrics, Inc. and its follow-ons for 22 years. Tucker received an A.B. Summa Cum Laude degree from Harvard University, where he has more recently taught compiler construction and programming language design. 32-D449 (Kiva)

November 25

Add to Calendar 2019-11-25 14:00:00 2019-11-25 15:00:00 America/New_York A Speed-of-Light Internet Service Provider Abstract: A variety of network applications, including electronic commerce and games, are either enabled by or benefit greatly from low latency communications. Studies have shown, however, that over medium and long distances the time to send a packet from one city to another on the public Internet is typically more than three times larger than the lower bound implied by the speed of light in free space. Hence for applications like high-frequency trading, where the winner of a communications race receives all the benefits, special purpose networks have been deployed. For example, between New Jersey and Chicago a succession of networks has been deployed, first fiber-based and then microwave-based, with each network reducing latency by a fraction of a millisecond over the previous. This talk explores the possibility of using the same radio technology to build a network backbone spanning the 120 largest population centers in the United States. The design places radios on existing towers, using topographic maps to ensure line-of-sight connectivity between towers. The impact of weather on the network is evaluated using historical weather data. Our analysis suggests that it should be possible to achieve mean speeds of over 95% of the speed of light over medium and long distances at a transmission cost of under $1 per GB.Joint work with Sangeetha A. J., Anthony Aquirre, Waqar Aqeel, Nadi Bozkurt, Bala Chandrasekaran, Brighten Godfrey, Greg Laughlin, Ankit Singla, and Muhammad Tirmazi.Bio: Bruce Maggs received the S.B., S.M., and Ph.D. degrees in computer science from the Massachusetts Institute of Technology in 1985, 1986, and 1989, respectively. His advisor was Charles Leiserson. After spending one year as a Postdoctoral Associate at MIT, he worked as a Research Scientist at NEC Research Institute in Princeton from 1990 to 1993. In 1994, he moved to Carnegie Mellon, where he stayed until joining Duke University in 2009 as a Professor in the Department of Computer Science. While on a two-year leave-of-absence from Carnegie Mellon, Maggs was a founding employee of Akamai Technologies, serving as its first Vice President for Research and Development. In 2018 he was part of a large team that received the inaugural SIGCOMM Networking Systems Award for the Akamai CDN, and was named an ACM Fellow. 32-G449 (Kiva)

November 18

Add to Calendar 2019-11-18 14:00:00 2019-11-18 15:00:00 America/New_York How to Modernize Compiler Technology Abstract: In this talk, I will show the path to the modernization of one important compiler technique -- vectorization. Vectorization was first introduced in the era of Cray vector processors during the 1980's. While vector supercomputers needed large vectors, which are mainly available by parallelizing loops, modern SIMD instructions efficiently work on short vectors. Thus, in 2000, Larsen et. al. introduced a new technique known as Superword Level Parallelism (SLP) based vectorization to fill this gap. First, I will show how we can take advantage of the power of modern computers for compilation, by using more accurate but expensive techniques to improve SLP vectorization. Due to the hardware resource constraints of the era, like many other compiler optimizations, SLP implementation was a greedy algorithm. In 2018, we introduced goSLP, which uses integer linear programming to find an optimal instruction packing strategy. Next, I will show how to truly modernize a compiler by automatically learning the necessary components of the compiler with Ithemal and Vemal. The actual cost of execution in a modern out-of-order, superscalar processor is much more complex than a linear model. Manually building such cost models as well as manually developing compiler optimizations is costly, tedious and is hard to keep up with the architectural changes. Ithemal is the first learnt cost model for predicting the throughput of x86 basic blocks with minimal human effort. Vemal is the first learnt policy for end-to-end vectorization as opposed to tuning heuristics. These data-driven techniques achieve state-of-the-art results while also reducing the development and maintenance burden of the compiler developer. 32-G449 Kiva/Patil

November 04

Add to Calendar 2019-11-04 14:00:00 2019-11-04 15:00:00 America/New_York GraphIt: A Domain-Specific Language for Writing High-Performance Graph Applications Abstract: In recent years, large graphs with billions of vertices and trillions of edges have emerged in many domains, such as social network analytics, machine learning, physical simulations, and biology. There is a strong demand for a programming system that allows domain experts to easily write applications that run quickly on these large graphs. However, programming high-performance graph applications is a challenging problem. The performance bottlenecks of graph applications depend on the algorithm, the size and structure of the graph, and the underlying hardware. There is not a set of optimizations that work well across all algorithms, graphs, and hardwares. Additionally, domain experts are often not familiar with low-level implementation details, such as atomic instructions and bit manipulations, needed for today's high-performance graph frameworks.In this talk, I will present our work on GraphIt, a new domain-specific language (DSL) that offers an easy-to-use programming model, achieves consistent high-performance across different algorithms and graphs, and supports both unordered and ordered graph algorithms. GraphIt decouples algorithms from performance optimizations (schedules) for graph applications to make it easy to explore a large space of performance optimizations. The compiler leverages a new graph iteration space model to generate fast code for different combinations of optimizations. We also introduce new priority-based extensions to support ordered graph algorithms and are working on code generation for GPU architectures.Bio: Yunming Zhang is a sixth year PhD student at MIT working with Saman Amarasinghe and Julian Shun. He works on domain-specific compilers, languages and performance optimizations for large-scale data analytics applications. 32-G449 (Kiva)

October 21

Add to Calendar 2019-10-21 14:00:00 2019-10-21 15:00:00 America/New_York SplinterDB: Closing the Bandwidth Gap on NVMe Abstract: Key-value stores are a fundamental component to storage systems. Recent advances in storage hardware, especially the advent of NVMe devices, expose new bottlenecks in the design of current key-value stores. We introduce SplinterDB and show how it achieves a 6-9x improvement in the insertion rate over Facebook's RocksDB on NVMe devices, while also improving query performance by 2-2.5x.Bio: Alex Conway is a PhD student in Computer Science at Rutgers University, advised by Martin Farach-Colton. His research is focused on external memory theory and storage systems. His recent publications include work on high performance key-value stores, file system aging, external memory hash tables and the analysis of storage optimization heuristics.He is a member of the BetrFS team. BetrFS, a prototype file system, is designed around the principles of write-optimization and is built using variants of B\epsilon-trees. G32-449 (Patil/Kiva)

September 30

Add to Calendar 2019-09-30 14:00:00 2019-09-30 15:00:00 America/New_York Julia: Making dynamic programs run fast Abstract: Julia is a general purpose language that utilizes type inference combined with a Just-In-Time (JIT) compiler, built upon LLVM, to achieve C-like performance while retaining the dynamic semantics of interactive languages like MATLAB, Python, and R. This talk will briefly introduce Julia, explore how it is possible for a dynamic language to be fast and to how Julia is making use of LLVM. We will focus on interacting with the compiler infrastructure and explore how to Julia targets accelerators like NVIDIA GPUs.Bio: Valentin Churavy is a PhD student at CSAIL. His research at the JuliaLab focuses on heterogeneous distributed computing for high-performance computing. He is interested in human-compiler interaction and how to help people write fast code in a high-level dynamic language. 32-D463 (Star)

September 23

Add to Calendar 2019-09-23 14:00:00 2019-09-23 15:00:00 America/New_York Tiramisu: A Polyhedral Compiler for Dense and Sparse Deep Learning Abstract: Tiramisu is a polyhedral compiler for deep learning. It has two unique features: (1) it is the first sparse DNN compiler; and (2) it can express and optimize general RNNs (Recurrent Neural Networks). Tiramisu relies on a flexible representation based on the polyhedral model and has a rich scheduling language allowing fine-grained control of optimizations. We show that Tiramisu matches or outperforms Intel MKL-DNN by up to 5x for sparse DNNs. We also show that Tiramisu allows many new capabilities such as wavefront parallelization for RNNs.Bio: Riyadh Baghdadi is a postdoctoral associate at CSAIL. He got his PhD from ENS Paris. He works on the area of compiler optimization and code generation for high performance architectures. Recently, he has been interested in exploring the intersection of compilers and deep learning. 32-D463 (Star)

September 16

Domain-Specific Accelerators

NVIDIA Corporation and Stanford University
Add to Calendar 2019-09-16 14:15:00 2019-09-16 15:15:00 America/New_York Domain-Specific Accelerators Abstract: Increasing computing performance enables new applications and greater value from computing. With the end of Moore's Law and Dennard Scaling, continued performance scaling will come primarily from specialization. Specialized hardware engines can achieve performance and efficiency from 10x to 10,000x a CPU through specialization, parallelism, and optimized memory access. Graphics processing units are an ideal platform on which to build domain-specific accelerators. They provide very efficient, high performance communication and memory subsystems - which are needed by all domains. Specialization is provided via "cores", such as tensor cores that accelerate deep learning or ray-tracing cores that accelerate specific applications. This talk will describe some common characteristics of domain-specific accelerators via case studies. Bio: Bill Dally is Chief Scientist and Senior Vice President of Research at NVIDIA Corporation and a Professor (Research) and former chair of Computer Science at Stanford University. Bill is currently working on developing hardware and software to accelerate demanding applications including machine learning, bioinformatics, and logical inference. He has a history of designing innovative and efficient experimental computing systems. While at Bell Labs Bill contributed to the BELLMAC32 microprocessor and designed the MARS hardware accelerator. At Caltech he designed the MOSSIM Simulation Engine and the Torus Routing Chip which pioneered wormhole routing and virtual-channel flow control. At the Massachusetts Institute of Technology his group built the J-Machine and the M- Machine, experimental parallel computer systems that pioneered the separation of mechanisms from programming models and demonstrated very low overhead synchronization and communication mechanisms. At Stanford University his group developed the Imagine processor, which introduced the concepts of stream processing and partitioned register organizations, the Merrimac supercomputer, which led to GPU computing, and the ELM low-power processor. Bill is a Member of the National Academy of Engineering, a Fellow of the IEEE, a Fellow of the ACM, and a Fellow of the American Academy of Arts and Sciences. He has received the ACM Eckert-Mauchly Award, the IEEE Seymour Cray Award, the ACM Maurice Wilkes award, the IEEE-CS Charles Babbage Award, and the IPSJ FUNAI Achievement Award. He currently leads projects on computer architecture, network architecture, circuit design, and programming systems. He has published over 250 papers in these areas, holds over 160 issued patents, and is an author of the textbooks, Digital Design: A Systems Approach, Digital Systems Engineering, and Principles and Practices of Interconnection Networks.Livestream: https://www.youtube.com/channel/UCYs2iUgksAhgoidZwEAimmg/live G32-449 (Patil/Kiva)

August 27

Add to Calendar 2019-08-27 15:00:00 2019-08-27 16:00:00 America/New_York Algorithms and Systems for Processing Massive Static and Evolving Graphs Abstract: Today, the largest publicly available graph dataset is the Hyperlink Web graph, with over 3.5 billion vertices and 128 billion edges. Despite being publicly available for several years now, most experimental work in the literature reports results on much smaller graphs, or resorts to external or distributed memory to process the Hyperlink graph. A natural question then, is to ask whether we can efficiently solve a broad class of problems on this graph in memory. Given that existing systems capable of processing this graph only support static processing, another interesting question is whether we can efficiently represent and update this graph in memory. In the first part of this talk I will present theoretically efficient implementations of parallel graph algorithms for a set of 20 important graph problems. Our codes, which we have publicly open-sourced as part of the Graph Based Benchmark Suite (GBBS) solve these problems on the Hyperlink Web graph using a single commodity multicore machine with a terabyte of RAM, with most of the codes running in under a minute on this graph. The running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs, and for many of the problems that we consider, this is the first time they have been solved on graphs at this scale. In the second part of this talk I will present Aspen, a framework for processing streaming graphs, which are graphs that evolve over time via a stream of updates (batches of edge insertions/deletions). The key idea is to represent the adjacency information using a purely-functional compressed tree data structure called a C-tree, which can be efficiently updated. Aspen achieves millisecond latency for processing small batches, and achieves throughputs of hundreds of millions of updates per second for very large batches, while providing strict serializability for both queries and updates. Using Aspen, we are able to concurrently update and run analytics on the Hyperlink Web graph using a single commodity multicore server with a terabyte of RAM. Based on joint work with Guy Blelloch and Julian Shun. Bio: Laxman is a fifth year PhD student at CMU, where he is fortunate to be advised by Guy Blelloch. His graduate work has focused mostly on designing fast and theoretically efficient parallel graph algorithms.Livestream link: https://www.youtube.com/channel/UCYs2iUgksAhgoidZwEAimmg/live G32-449 (Patil/Kiva)

August 20

Add to Calendar 2019-08-20 15:00:00 2019-08-20 16:00:00 America/New_York Tapir: Embedding Recursive Fork-Join Parallelism into LLVM's Intermediate Representation Abstract: Tapir is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler's IR. Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. These calls prevent the compiler from performing optimizations on and across parallel control constructs. Remedying this situation has generally been thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics. Tapir leverages the "serial-projection property," which is commonly satisfied by task-parallel programs, to handle the semantics of these programs without an extensive rework of the compiler. For recursive fork-join programs that satisfy the serial-projection property, Tapir enables effective compiler optimization of parallel programs with only minor changes to existing compiler analyses and code transformations. Tapir uses the serial-projection property to order logically parallel fine-grained tasks in the program's control-flow graph. This ordered representation of parallel tasks allows the compiler to optimize parallel codes effectively with only minor modifications. For example, to implement Tapir/LLVM, a prototype of Tapir in the LLVM compiler, we added or modified less than 3000 lines of LLVM's half-million-line core middle-end functionality. These changes suffice to enable LLVM's existing compiler optimizations for serial code to work with parallel control constructs such as parallel loops and Cilk's cilk_spawn keyword. Tapir also supports parallel optimizations, such as loop scheduling and loop stripmining, that restructure the parallel control flow of the program. By making use of existing LLVM optimizations and new parallel optimizations, Tapir/LLVM can optimize recursive fork-join programs more effectively than traditional compilation methods. On a suite of 35 Cilk application benchmarks, Tapir/LLVM produces more efficient executables for 30 benchmarks, with faster 18-core running times for 26 of them, compared to a nearly identical compiler that compiles parallel linguistic constructs the traditional way. In addition, by integrating Tapir/LLVM into the Accelerated Linear Algebra (XLA) compiler in Google's TensorFlow machine-learning framework, Tapir/LLVM enables more effective compiler optimization of a variety of neural networks written in TensorFlow, improving the parallel performance of these networks by a geometric-mean multiplicative factor of 30%-100% across a variety of CPU hardware architectures. Bio: Tao B. Schardl is a Research Scientist in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). His research combines algorithms and systems to develop technologies that support principled, scientific approaches to writing fast code. He has previously worked on parallel programming models, theories of performance, diagnostic tools, and compilers to simplify the task of software performance engineering. His work on the Tapir/LLVM compiler earned the best paper award at the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming in 2017. Dr. Schardl received his S.B. in Computer Science and Electrical Engineering from MIT in 2009, his M.Eng. in Computer Science and Electrical Engineering from MIT in 2010, and his Ph.D. in Computer Science and Engineering from MIT in 2016.Livestream link: https://www.youtube.com/channel/UCYs2iUgksAhgoidZwEAimmg/live G32-449 (Patil/Kiva)

August 06

Add to Calendar 2019-08-06 15:00:00 2019-08-06 16:00:00 America/New_York Optimal system settings: How to not lose before you begin Abstract: Most default system configurations (physical and virtual) are set to maximize compatability and maintainability. In such configurations, good performance may be impossible. High performance system settings are rarely documented. In this talk, we will describe how simple benchmarking can be used to determine the optimal settings that are a precondition to getting high performance. Reference: https://arxiv.org/abs/1907.03195 Bio: Dr. Jeremy Kepner is Head and Founder of the MIT Lincoln Laboratory Supercomputing Center (LLSC). He is also a Founder of the MIT-Air Force AI Accelerator. Lincoln Laboratory is a 4000-person National Laboratory whose mission is to create defensive technologies to protect our Nation and the freedoms enshrined in the Constitution of the United States. Dr. Kepner is one of five Lincoln Laboratory Fellows, a position that "recognizes the Laboratory's strongest technical talent for outstanding contributions to Laboratory programs over many years." Dr. Kepner is focused on creating state-of-the-art computing capabilities to enable scientific and engineering breakthroughs to solve the Nation's most challenging problems. He received his PhD in Astrophysics from Princetion University in 1998 (advisor: Prof. David Spergel) and his BA in Astrophysics from Pomona College in 1991.Livestream link: https://www.youtube.com/channel/UCYs2iUgksAhgoidZwEAimmg/live G32-449 (Patil/Kiva)

July 30

Add to Calendar 2019-07-30 15:00:00 2019-07-30 16:00:00 America/New_York Advances in Determinacy Race Detection for Task-Parallel Code Abstract: Task parallelism is designed to simplify the job of writing parallel code that can utilize the multicore hardware efficiently. With task parallelism, the programmer expresses the logical parallelism of the computation using high-level parallel control constructs, and lets the the underlying runtime system automates the necessary scheduling and synchronizations. Even with task parallelism, writing correct and efficient parallel code can still be challenging.One of the challenges is to deal with determinacy races, which occur when logically parallel parts of the computation access the same memory location and at least one of the accesses is a write. Determinacy races are generally bugs in the program since they lead to non-deterministic program behavior --- different schedules of the program can lead to different results. Moreover, they are difficult to detect and debug, since a race may manifest itself in one run but not in another.In this talk, I will discuss recent advances in detecting determinacy races for task-parallel code. Most prior work on detecting determinacy races has focused on fork-join parallelism, which generates series-parallel dags, a class of computations that is well structured. We will discuss race detection algorithms that can race detect programs with more general structures.Bio: I-Ting Angelina Lee is an assistant professor in the Computer Science and Engineering department in Washington University in St. Louis. Prior to that, she worked with the Supertech research group in Massachusetts Institute of Technology lead by Professor Charles Leiserson for her graduate study and subsequently as a postdoctoral associate.Dr. Lee's research focuses on advancing software technologies for parallel computing. She is interested in many aspects of parallel computing, including designing programming models and linguistic constructs to simplify parallel programming, developing runtime and operating system support to execute multithreaded programs efficiently, and building software tools to aid debugging and performance engineering of multithreaded code.Livestream link: https://www.youtube.com/channel/UCYs2iUgksAhgoidZwEAimmg/live G32-449 (Patil/Kiva)

July 23

Add to Calendar 2019-07-23 15:00:00 2019-07-23 16:00:00 America/New_York Speculative Concurrency for Ethereum Smart Contracts Modern cryptocurrency systems, such as Ethereum, permit complex financial transactions through scripts called smart contracts. These smart contracts are executed many, many times, always without real concurrency. Serial execution limits system throughput and fails to exploit today's concurrent multicore and cluster architectures.This talk presents a novel way to permit miners and validators to execute smart contracts in parallel, based on techniques adapted from software transactional memory. Miners execute smart contracts speculatively in parallel, "discovering" a serializable concurrent schedule for a block's transactions, This schedule is captured and encoded as a deterministic fork-join program used by validators to re-execute the miner's parallel schedule deterministically but concurrently.We examined historical data to estimate the potential benefit of speculative techniques for executing Ethereum smart contracts in parallel. We find that our speculative technique yields estimated speed-ups starting at about 8-fold in 2016, declining to about 2-fold at the end of 2017, where speed-up is measured using either gas costs or instruction counts. We also observe that a small set of contracts are responsible for many data conflicts resulting from speculative concurrent execution.Joint work with Thomas Dickerson, Paul Gazzillo, Eric Koskinen, and Vikram Saraph.Livestream link: https://www.youtube.com/channel/UCYs2iUgksAhgoidZwEAimmg/live G32-449 (Patil/Kiva)

July 16

Add to Calendar 2019-07-16 15:00:00 2019-07-16 16:00:00 America/New_York AutoML for Efficiently Designing Efficient Neural Network Architectures Abstract: Efficient deep learning computing requires algorithm and hardware co-design to enable specialization and acceleration: we usually need to change the algorithm to reduce memory footprint and improve energy efficiency. However, this is a large design space: neural network models are notoriously hard to tune to balance accuracy and latency. Human engineers can hardly exhaust the design space by heuristics. It's labor consuming and sub-optimal. This talk will describe our recent AutoML techniques to systematically guide neural network architecture design. It will cover: automatically design small and fast models (ProxylessNAS), auto channel pruning (AMC), and auto mixed-precision quantization (HAQ). We demonstrate such learning-based, automated design achieves superior performance and efficiency than rule-based human design. Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms. Bio: Song Han is an assistant professor at MIT EECS. Dr. Han received the Ph.D. degree in Electrical Engineering from Stanford advised by Prof. Bill Dally. Dr. Han's research focuses on efficient deep learning computing. He proposed "Deep Compression" and "EIE Accelerator" that impacted the industry. His work received the best paper award in ICLR'16 and FPGA'17. He is the co-founder and chief scientist of DeePhi Tech (a leading efficient deep learning solution provider), which was acquired by Xilinx in 2018.Livestream link: https://www.youtube.com/channel/UCYs2iUgksAhgoidZwEAimmg/live G32-449 (Patil/Kiva)