Publications

2016

Cutty: Aggregate Sharing for User-defined Windows
Paris Carbone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, Volker Markl
CIKM 2016

Bridging the Gap: Towards Optimizations across Linear and Relational Algebra
Andreas Kunft, Alexander Alexandrov, Asterios Katsifodimos, Volker Markl.
BeyondMR 2016, ACM SIGMOD Workshop 2016

RDFind: Scalable Conditional Inclusion Dependency Discovery in RDF Datasets
Sebastian Kruse, Anja Jentzsch, Thorsten Papenbrock, Zoi Kaoudi, Jorge Arnulfo Quiané-Ruiz Felix Naumann
SIGMOD 2016

Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale
Astrid Rheinländer, Mario Lehmann, Anja Kunkel, Jörg Meier, Ulf Leser
SIGMOD 2016
[PDF]

Implicit Parallelism through Deep Language Embedding
Alexander Alexandrov, Asterios Katsifodimos, Georgi Krastev, Volker Markl
SIGMOD Record, March 2016
[PDF]

Emma in Action: Declarative Dataflows for Scalable Data Analysis
Alexander Alexandrov, Andreas Salzmann, Georgi Krastev, Asterios Katsifodimos, Volker Markl
Demo at SIGMOD 2016
[PDF]

2015

Apache Flink: Stream and Batch Processing in a Single Engine
Paris Carbone, Stephan Ewen, Seif Haridi, Asterios Katsifodimos, Volker Markl, Kostas Tzoumas
IEEE Data Engineering Bulletin, in the special issue on Next-gen Stream Processing (December 2015, Vol. 38 No. 4)

Elastic Stream Processing with Latency Guarantees
Björn Lohrmann, Peter Janacik, Odej Kao
ICDCS 2015

Elastic Stream Processing with Latency Guarantees
Björn Lohrmann, Peter Janacik, Odej Kao
ICDCS 2015

SOFA: An Extensible Logical Optimizer for UDF-heavy Data Flows
Astrid Rheinländer, Arvid Heise, Fabian Hueske, Ulf Leser, Felix Naumann
Information Systems, Elsevier, 2015

Implicit Parallelism through Deep Language Embedding
Alexander Alexandrov, Andreas Kunft, Asterios Katsifodimos, Felix Schüler, Lauritz Thamsen, Odej Kao, Tobias Herb, Volker Markl
SIGMOD 2015
[PDF]

Optimistic Recovery For Iterative Dataflows in Action
Sergey Dudoladov, Chen Xu, Sebastian Schelter, Asterios Katsifodimos, Stephan Ewen, Kostas Tzoumas, Volker Markl
Demo at SIGMOD 2015

Scaling Out the Discovery of Inclusion Dependencies
Sebastian Kruse, Thorsten Papenbrock, Felix Naumann
BTW 2015
[PDF]

2014

Estimating the Number and Sizes of Fuzzy-Duplicate Clusters
Arvid Heise, Gjergji Kasneci, Felix Naumann
CIKM 2014

Runtime Analysis of Distributed Data Processing Programs
Marcus Leich (Advisor: Volker Markl)
PhD Workshop at VLDB 2014 - received Best Paper Award
[PDF]

Versatile optimization of UDF-heavy data flows with Sofa
Astrid Rheinländer, Martin Beckmann, Anja Kunkel, Arvid Heise, Thomas Stoltmann, and Ulf Leser
Demo at SIGMOD 2014
[Link]

The Stratosphere platform for Big Data Analytics
Alexander Alexandrov , Rico Bergmann , Stephan Ewen , Johann-Christoph Freytag , Fabian Hueske , Arvid Heise , Odej Kao , Marcus Leich , Ulf Leser , Volker Markl , Felix Naumann , Mathias Peters , Astrid Rheinländer , Matthias J. Sax , Sebastian Schelter , Mareike Höger , Kostas Tzoumas , Daniel Warneke
VLDB Journal 2014
[PDF]

2013

"All Roads Lead to Rome:" Optimistic Recovery for Distributed Iterative Data Processing
Sebastian Schelter, Stephan Ewen, Kostas Tzoumas, Volker Markl
CIKM, 2013 [PDF]

Nephele Streaming: Stream Processing Under QoS Constraints at Scale
Björn Lohrmann, Daniel Warneke, Odej Kao
Journal of Cluster Computing, Springer US, 2013
[Link] [Pre-Print PDF]

Adaptive Online Compression in Clouds—Making Informed Decisions in Virtual Machine Environments
Matthias Hovestadt, Odej Kao, Andreas Kliem, Daniel Warneke
Journal of Grid Computing, Springer, 2013
[Link]

Large-Scale Social-Media Analytics on Stratosphere
Christoph Boden, Marcel Karnstedt, Miriam Fernandez, Volker Markl
WWW, 2013

Iterative Parallel Data Processing with Stratosphere: An Inside Look
Stephan Ewen, Sebastian Schelter, Kostas Tzoumas, Daniel Warneke, Volker Markl
SIGMOD, 2013

Peeking into the Optimization of Data Flow Programs with MapReduce-style UDFs
Fabian Hueske, Mathias Peters, Aljoscha Krettek, Matthias Ringwald, Kostas Tzoumas, Volker Markl, Johann-Christoph Freytag
ICDE, 2013
[PDF] [Poster PDF] [Video]

Applying Stratosphere for Big Data Analytics
Marcus Leich, Jochen Adamek, Moritz Schubotz, Arvid Heise, Astrid Rheinländer, Volker Markl
BTW, 2013 (Demo)
[PDF]

2012

Meteor/Sopremo: An Extensible Query Language and Operator Model
Arvid Heise, Astrid Rheinländer, Marcus Leich, Ulf Leser, and Felix Naumann
BigData Workshop (2012), affiliated with VLDB
[PDF]

Spinning Fast Iterative Data Flows
Stephan Ewen, Moritz Kaufmann, Kostas Tzoumas, Volker Markl
PVLDB 5(11), 2012, pp. 1268-1279
[PDF] [DOI]

Opening the Black Boxes in Data Flow Optimization
Fabian Hueske, Mathias Peters, Matthias J. Sax, Astrid Rheinländer, Rico Bergmann, Aljoscha Krettek, Kostas Tzoumas
PVLDB 5(11), 2012: pp. 1256-1267
[PDF] [DOI]

Myriad: Scalable and Expressive Data Generation
Alexander Alexandrov, Kostas Tzoumas, Volker Markl
PVLDB, 5(12), 2012: pp. 1890-1893
[PDF] [DOI]

Massively-Parallel Stream Processing under QoS Constraints with Nephele
Björn Lohrmann, Daniel Warneke, Odej Kao
Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC) 2012 ACM, pp. 271-282
[PDF]   [BibTex]   [DOI]

Enabling Operator Reordering in Data Flow Programs Through Static Code Analysis
Fabian Hueske, Aljoscha Krettek, Kostas Tzoumas
XLDI Workshop (2012), affiliated with ICFP
[PDF]

2011

MapReduce and PACT - Comparing Data Parallel Programming Models
Alexander Alexandrov, Stephan Ewen, Max Heimel, Fabian Hueske, Odej Kao, Volker Markl, Erik Nijkamp, Daniel Warneke
In Proceedings of Datenbanksysteme für Business, Technologie und Web (BTW) 2011, pp. 25-44
[PDF]   [BibTex]

Exploiting Dynamic Resource Allocation for Efficient Parallel Data Processing in the Cloud
Daniel Warneke, Odej Kao
In Journal IEEE Transactions on Parallel and Distributed Systems (TPDS), Special Issue on Many-Task Computing, 2011, pp. 985-997
[PDF]   [BibTex]   [DOI]

Evaluating Adaptive Compression to Mitigate the Effects of Shared I/O in Clouds
Matthias Hovestadt, Odej Kao, Andreas Kliem, Daniel Warneke
Proceedings of the 1st International Workshop on Data Intensive Computing in the Clouds (DataCloud), 2011
[PDF] [BibTex]   [DOI]

Evaluation of Network Topology Inference in Opaque Compute Clouds Through End-to-End Measurements
Dominic Battré, Natalia Frejnik, Siddhant Goel, Odej Kao, Daniel Warneke
Proceedings of the 4th IEEE International Conference on Cloud Computing (IEEE CLOUD), 2011
[PDF] [BibTex]   [DOI]

Inferring Network Topologies in Infrastructure as a Service Clouds
Dominic Battré, Natalia Frejnik, Siddhant Goel, Odej Kao, Daniel Warneke
Proceedings of the 11th International Symposium on Cluster, Cloud, and Grid computing (CCGrid), 2011, pp. 604-605
[PDF] [BibTex] [DOI]

Myriad - Parallel Data Generation on Shared-Nothing Architectures
Alexander S. Alexandrov, Berni Schiefer, John Poelman, Stephan Ewen, Thomas Bodner, Volker Markl
Proceedings of the First Workshop on Architectures and Systems for Big Data (ASBD), 2011
[PDF]
[BibTex]

2010

Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical Processing
Dominic Battré, Stephan Ewen, Fabian Hueske, Odej Kao, Volker Markl, and Daniel Warneke
In Proceedings of the ACM Symposium on Cloud Computing (SoCC) 2010 ACM, pp. 119–130
[PDF] [BibTex] [DOI]

Massively Parallel Data Analysis with PACTs on Nephele
Alexander Alexandrov, Dominic Battré, Stephan Ewen, Max Heimel, Fabian Hueske, Odej Kao, Volker Markl, Erik Nijkamp, Daniel Warneke
PVLDB Vol. 3, No. 2, 2010, pp. 1625–1628
[PDF] [Poster PDF] [BibTex]

Detecting Bottlenecks in Parallel DAG-based Data Flow Programs
Dominic Battré, Matthias Hovestadt, Björn Lohrmann, Alexander Stanik, Daniel Warneke
In Proceedings of Many-Task Computing on Grids and Supercomputers (MTAGS) 2010, pp. 1–10
[PDF] [BibTex] [DOI]

2009

Nephele: Efficient Parallel Data Processing in the Cloud
Daniel Warneke and Odej Kao
In Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, MTAGS 2009
[PDF] [BibTex] [DOI]