Volker Markl
Technical University of Berlin, Germany
Keynote title: NebulaStream – Data Stream Processing in Massively Distributed Heterogeneous Environments
Abstract
Modern data-driven applications arising in such domains as smart manufacturing, healthcare, and the Internet of Things, pose new challenges to data processing systems. Traditional stream processing systems, such as Flink, Spark, or Kafka Streams are ill-suited to cope with the massive scale of distribution, the heterogeneous computing landscape, and the requirement for timely processing and actuation. Classical approaches like managed runtimes, interpretation-based query processing, and the optimization of single queries that neglect interactions, greatly limit throughput, latency, energy-efficiency, and the general usability of these systems for emerging applications involving distributed data processing at scale in a sensor-edge-cloud-environment. At BIFOLD / TU Berlin, we are researching and building NebulaStream, a novel data-stream processing system for massively distributed, heterogeneous environments. NebulaStream supports (potentially resource-constrained) heterogeneous devices, a hierarchical topology (with the distribution of computation and data flow in a cloud-edge-continuum), and the sharing of computations and data across multiple concurrent queries. The key distinguishing features of NebulaStream from a technological perspective, include the following. (1) An incremental and continuous query optimizer that considers the sharing of computation and intermediate results in conjunction with the placement of operations in a massively distributed, heterogeneous cloud-edge continuum. (2) A compilation-based approach for streaming queries, which avoids the need for managed runtimes and ensures excellent throughput, latency, and energy-efficiency across the board, from small embedded devices to powerful processors. (3) A distributed runtime that supports on-demand in-network processing on a hierarchical topology of heterogeneous devices in an efficient and fault-tolerant way. In this talk, we will describe several challenges arising due to novel applications and architectures for distributed data stream processing. We will present NebulaStream, an innovative open-source system, currently being built to address these challenges. In addition, we will describe NebulaStream’s design principles, architecture, performance, application scenarios, as well as the current status of the open-source development.
Biography
Volker Markl is a Full Professor and Chair of the Database Systems and Information Management (DIMA) Group at the Technische Universität Berlin (TU Berlin). At the German Research Center for Artificial Intelligence (DFKI), he is Chief Scientist and Head of the Intelligent Analytics for Massive Data Research Group. In addition, he is Director of the Berlin Institute for the Foundations of Learnig and Data (BIFOLD), a merger of the Berlin Big Data Center (BBDC) and the Berlin Center for Machine Learning (BZML). BIFOLD is one of Germany’s national Competence Centers for Artificial Intelligence and will further bolster ongoing collaborative research in scalable data management and Machine Learning. Dr. Markl is a database systems researcher conducting research at the intersection of of distributed systems, scalable data processing, text mining, computer networks, machine learning, and applications in healthcare, logistics, Industry 4.0, and information marketplaces. Earlier in his career, he was a Research Staff Member and Project Leader at the IBM Almaden Research Center in San Jose, California, USA and a Research Group Leader at FORWISS, the Bavarian Research Center for Knowledge-based Systems located in Munich, Germany. Volker Markl is a computer science graduate from Technische Universität München, where he earned his Diploma in 1995 with a thesis on exception handling in programming languages. He earned his PhD in 1999 the area of multidimensional indexing under the supervision of Rudolf Bayer. Volker Markl has published numerous scholarly papers on indexing, query optimization, lightweight information integration, and scalable data processing at prestigious venues. He holds 18 patents, has transferred technology into several commercial products, and has been involved in two successful startup exits. He has been both the Speaker and Principal Investigator for the Stratosphere Project, which resulted in a Humboldt Innovation Award as well as Apache Flink, the open-source big data analytics system. He currently serves as the President of the VLDB Endowment and was elected as one of Germany’s leading Digital Minds (Digitale Köpfe) by the German Informatics (GI) Society. Volker also is a member of the Scientific Advisory Board of Software AG. Most recently, Volker and his team earned the ACM SIGMOD 2020 Best Paper Award, for their work on "Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects“.
Tyler Akidau
Snowflake Inc., USA
Keynote title: Simplicity and Elegance in Stream Processing: A Five Year Odyssey
Abstract
At DEBS 2019, I had the opportunity to speak about my take on the open problems in stream processing at the time. Now five years later, I’m happy to have the opportunity to return and talk about the investments we’ve made on those problems at Snowflake in the years since. Using those seven open problems as a framework (Graceful Evolution, Operational Ease of Use, SQL, Formal Semantics, Latency ↔ Cost ↔ Correctness, Batch + Streaming Interoperability, Database-style optimizations), I’ll discuss the areas where we’ve made good progress, both at Snowflake and across the industry as a whole, as well as the areas where a substantial amount of work remains. Much of the talk will center around Dynamic Tables, Snowflake’s declarative batch+streaming transformation primitive that is the centerpiece of our streaming offerings. Designed to hide the complexity of stream processing under the simple but powerful interface of a SQL query and a target lag, Dynamic Tables deliver the promise of truly unified batch and stream processing in an easy to use, accessible, and operationally hands off packaging. A truly remarkable feat of engineering, I’ll show how Dynamic Tables have helped move the needle for each of the seven open problems from my 2019 talk. In addition, I will touch upon other pieces of the Snowflake streaming portfolio, such as our streaming ingestion service, Snowpipe Streaming; talk briefly about our time spent collaborating on the noble experiment that was the ill-fated SQL Standards Expert Group on Streaming; and give a glimpse of some of the more forward looking efforts we’re actively working on now. By the end, I hope to convey the optimism we at Snowflake all feel regarding the progress made, and the opportunities remaining, in this fascinating field of streaming data.
Biography
Tyler Akidau has spent the better part of the last two decades working on and opining about large scale distributed stream processing. Best known as the author of the seminal Streaming 101 and Streaming 102 blog posts, as well as the O’Reilly Streaming Systems book, his true passion lies in helping build and lead talented teams of exceptional engineers to pragmatically push forward the state of the art. He is currently a Distinguished Software Engineer at Snowflake, helping drive the streaming agenda there, amongst other efforts. He’s also proud to be the co-author on a number of industrial track conference publications, the most recent of which being the 2023 SIGMOD paper, What’s the Difference? Incremental Processing with Change Queries in Snowflake.
Evangelia Kalyvianaki
University of Cambridge, UK
Keynote title: Distributed scheduling in modern data centers: to optimize or not?
Biography
Dr Evangelia (Eva) Kalyvianaki is an Associate Professor in the Department of Computer Science and Technology (CST) at the University of Cambridge where she co-leads the Systems Research Group (SRG). She is also the vice-chair of the European Chapter of ACM SIGOPS (EuroSys). She was an Associate Editor for IEEE/ACM Transactions of Networking (ToN) journal a Fellow at the Alan Turing Institute (2018-2021). Dr Kalyvianaki’s research interests span the areas of cloud computing, resource management, big data processing, distributed systems and systems in general. She has publications in top-tier leading conferences in systems (USENIX ATC), in data management and database systems (SIGMOD, ICDE), autonomic computing (ICAC, TAAS) and control theory (CDC, ToSC, ECC). She and her co-authors have received the 2023 ACM SIGMOD Test-of-Time Award for their paper entitled “Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management”. Her past work on novel cloud pricing models was featured in “The Register”. Over the years, Dr Kalyvianaki and her collaborators have received significant funding from the UKRI and the industry; e.g., she was co-awarded the 2014 VMware Systems Research Award. She has an extensive track record of editorial work and top systems’ conference paper and journal reviewing. At CST, she teaches Cloud Computing and Operating Systems, and she was the Deputy Director of Postgraduate Education on Researcher Development and supervises several PhDs students and BSc and award-winning MSc projects. She is currently on sabbatical academical leave from CST and she is working at Meta at Magnit.
Events | Dates (AoE) |
---|---|
Research Papers | |
Abstract Submission | |
Paper Submission | |
Notification | |
Final Decision | |
Camera Ready | |
Submission Dates | |
Industry and Application Paper Submission | |
Doctoral Symposium Submission | |
Poster and Demo Paper Submission | |
Notification Dates | |
Author Notification Industry and Application Track | |
Author Notification Doctoral Symposium | |
Author Notification Poster & Demo | |
Camera Ready | |
Camera Ready for Industry and Application Track | |
Camera Ready for Doctoral Symposium | |
Camera Ready for Poster & Demo | |
Conference | |
Conference | June 25th–28th 2024 |