From Theory to Practice Efficient Join Query Evaluation in a Parallel Database System Paper Review

Jelle Hellings

Assistant professor
Database Theory and Technology Lab
Information Technology Building, Room 124 (flooring map)
Department of Computing and Software
McMaster Academy
1280 Main Street West, Hamilton, ON, Canada.

My research is centered effectually novel directions for high-functioning large-scale data management systems. My research has a strong theoretical component (due east.g., lower bound results, finite model theory, dependency theory) and a strong algorithmic component (e.g., external-memory algorithms, distributed algorithms, join algorithms). Currently, my focus is on the development of scalable resilient systems that can manage data and processing complex transactions, while providing strong guarantees toward users in the presence of faulty behavior (e.g., hardware failures, software failures, and malicious attacks).

Previously, I was a Postdoc Scholar in the Exploratory Systems Lab at the Computer Science Department of the University of California, Davis, where I worked on scalable resilient distributed data processing under the supervision of Mohammed Sadoghi. I did my doctoral training in the Databases and Theoretical Computer Science research group at Hasselt University nether the supervision of Marc Gyssens. During my doctoral training, I worked on semi-structured data with a main focus on graph databases (e.grand., graph query languages, constraints on graph data, graph query evaluation algorithms). Finally, I did my Bachelor and Master in reckoner scientific discipline and engineering at the Eindhoven University of Technology, where my last Master projection focused on external-memory algorithms for indexing very large graph datasets.

I refer to a recent Postdoc Spotlight for some further background on me and my work.

Publications (Export citations as BibTeX)

Jump to Books, Periodical Papers, Conference Proceedings, Tutorials, Demos, and Talks, Theses.

Books

  1. MC 2021.
    Fault-tolerant distributed transactions on blockchain

    . . (2021). In: Synthesis Lectures on Data Direction. Morgan & Claypool. DOI: ten.2200/S01068ED1V01Y202012DTM065.

    Abstract

    Since the introduction of Bitcoin--the first widespread application driven by blockchain--the interest of the public and private sectors in blockchain has skyrocketed. In recent years, blockchain-based fabrics have been used to accost challenges in diverse fields such every bit trade, food product, belongings rights, identity-management, aid delivery, health care, and fraud prevention.

    These fundamental concepts and the technologies behind them--a generic ledger-based data model, cryptographically ensured information integrity, and consensus-based replication--prove to be a powerful and inspiring combination, a catalyst to promote computational trust. In this volume, we present an in-depth study of blockchain, unraveling its revolutionary hope to instill computational trust in society, all advisedly tailored to a broad audition including students, researchers, and practitioners. We offer a comprehensive overview of theoretical limitations and practical usability of consensus protocols while examining the diverse landscape of how blockchains are manifested in their permissioned and permissionless forms.

Journal Papers

  1. JLAMP 2022.
    The power of Tarski's relation algebra on trees

    . Jelle Hellings, Yuqing Wu, Marc Gyssens, and Dirk Van Gucht. (2022). In: Journal of Logical and Algebraic Methods in Programming. Elsevier. DOI: 10.1016/j.jlamp.2022.100748. Run into also FoIKS 2018a, PhD thesis.

    writer re-create, project folio.
    Abstruse

    Fragments of Tarski's relation algebra form the basis of many versatile graph and tree query languages including the regular path queries, XPath, and SPARQL. Surprisingly, however, a systematic study of the relative expressive power of relation algebra fragments on trees has not yet been undertaken. In this work, nosotros perform such a systematic study. Our approach is to start from a basic fragment which but allows composition and marriage. We then study how the expressive power of the query language changes if we add diversity, converse, projections, coprojections, intersection, and/or divergence, both for path queries and Boolean queries. For path queries on labeled copse, we found that adding intersection and difference yields more expressive ability for some fragments, while adding one of the other operators always yields more than expressive power. For Boolean queries on labeled trees, we obtain a similar film for the relative expressive power, except for a few fragments where adding converse or projection yields no more expressive power. Additionally, we too studied querying unlabeled trees, for which we accept found several redundancies. One challenging problem remains open, however, for both path and Boolean queries: does calculation departure yield more expressive power to fragments containing at least diversity, coprojections, and intersection?

  2. VLDB 2021.
    ByShard: Sharding in a Byzantine environment

    . Jelle Hellings and Mohammad Sadoghi. (2021). In: Proceedings of the VLDB Endowment, xiv(11), 2230-2243. VLDB. DOI: 10.14778/3476249.3476275.

    writer re-create, slides, affiche, project folio.
    Video Presentation

    Abstract

    The emergence of blockchains has fueled the development of resilient systems that can bargain with Byzantine failures due to crashes, bugs, or even malicious behavior. Recently, we have also seen the exploration of sharding in these resilient systems, this to provide the scalability required past very large data-based applications. Unfortunately, electric current sharded resilient systems all use organisation-specific specialized approaches toward sharding that exercise not provide the flexibility of traditional sharded data management systems.To ameliorate on this situation, we fundamentally look at the design of sharded resilient systems. Nosotros practice so by introducing ByShard, a unifying framework for the study of sharded resilient systems. Inside this framework, we testify how two-phase commit and ii-phase locking--two techniques primal to providing atomicity and isolation in traditional sharded databases--tin be implemented efficiently in a Byzantine environment, this with a minimal usage of costly Byzantine resilient primitives. Based on these techniques, we propose 18 multi-shard transaction processing protocols. Finally, we practically evaluate these protocols and prove that each protocol supports high transaction throughput and provides scalability while each hitting its own trade-off between throughput, isolation level, latency, and arrest rate. As such, our work provides a strong foundation for the development of Acid-compliant general-purpose and flexible sharded resilient data management systems.

  3. CJ 2020.
    From relation algebra to semi-join algebra: An arroyo to graph query optimization

    . Jelle Hellings, Catherine L. Pilachowski, Dirk Van Gucht, Marc Gyssens, and Yuqing Wu. (2020). In: The Computer Journal, 64(5), 789--811. Oxford Academy Printing. DOI: 10.1093/comjnl/bxaa031. See also DBPL 2017, PhD thesis.

    author copy.
    Abstract

    Many graph query languages rely on composition to navigate graphs and select nodes of interest, even though evaluating compositions of relations can be plush. Often, this need for composition can be reduced by rewriting towards queries using semi-joins instead, resulting in a pregnant reduction of the query evaluation price. We study techniques to recognize and employ such rewritings. Concretely, we study the relationship between the expressive power of the relation algebras, which heavily rely on composition, and the semi-bring together algebras, which supersede limerick in favor of semi-joins. Our main result is that each fragment of the relation algebras where intersection and/or difference is simply used on edges (and not on complex compositions) is expressively equivalent to a fragment of the semi-join algebras. This expressive equivalence holds for node queries evaluating to sets of nodes. For applied relevance, we showroom constructive rules for rewriting relation algebra queries to semi-join algebra queries, and prove that they lead to only a well-bounded increment in the number of steps needed to evaluate the rewritten queries. In improver, on sibling-ordered trees, we institute new relationships among the expressive power of Regular XPath, Conditional XPath, FO-logic, and the semi-join algebra augmented with restricted fixpoint operators.

  4. VLDB 2020.
    ResilientDB: global scale resilient blockchain fabric

    . Suyash Gupta, Sajjad Rahnama, Jelle Hellings, and Mohammad Sadoghi. (2020). In: Proceedings of the VLDB Endowment, 13(half-dozen), 868-883. VLDB. DOI: 10.14778/3380750.3380757. See too FAB 2020.

    writer re-create, technical written report, video by Suyash Gupta.
    Abstract

    Recent developments in blockchain technology accept inspired innovative new designs in resilient distributed and database systems. At their core, these blockchain applications typically use Byzantine error-tolerant consensus protocols to maintain a mutual state across all replicas, even if some replicas are faulty or malicious. Unfortunately, existing consensus protocols are non designed to bargain with geo-scale deployments in which many replicas spread across a geographically big surface area participate in consensus.

    To accost this, we present the Geo-Scale Byzantine Fault-Tolerant consensus protocol (GeoBFT). GeoBFT is designed for excellent scalability past using a topological-aware group of replicas in local clusters, past introducing parallelization of consensus at the local level, and by minimizing communication between clusters. To validate our vision of loftier-performance geo-scale resilient distributed systems, nosotros implement GeoBFT in our efficient ResilientDB permissioned blockchain fabric. We show that GeoBFT is not simply sound and provides corking scalability, but also outperforms state-of-the-art consensus protocols past a gene of six in geo-scale deployments.

  5. IS 2020.
    Comparing the expressiveness of down fragments of the relation algebra with transitive closure on trees

    . Jelle Hellings, Marc Gyssens, Yuqing Wu, Dirk Van Gucht, Jan Van den Bussche, Stijn Vansummeren, and George H. 50. Fletcher. (2020). In: Information Systems, 89. Elsevier. DOI: 10.1016/j.is.2019.101467. See also DBPL 2015, PhD thesis.

    author re-create, technical report, project page.
    Abstract

    Motivated by the continuing interest in the tree data model, we study the expressive power of downward navigational query languages on trees and chains. Basic navigational queries are built from the identity relation and edge relations using composition and union. Nosotros study the effects on relative expressiveness when we add transitive closure, projections, coprojections, intersection, and deviation; this for boolean queries and path queries on labeled and unlabeled structures. In all cases, we present the complete Hasse diagram. In particular, we establish, for each query language fragment that we report on trees, whether it is closed nether difference and intersection.

  6. AMAI 2019.
    First-social club definable counting-but queries

    . Jelle Hellings, Marc Gyssens, Dirk Van Gucht, and Yuqing Wu. (2019). In: Register of Mathematics and Artificial Intelligence, 87, 109-136. Springer. DOI: 10.1007/s10472-019-09652-viii. Run into also FoIKS 2018b.

    author copy.
    Abstract

    Many data sources can be represented easily by collections of sets of objects. For several practical queries on such collections of sets of objects, the answer does not depend on the precise limerick of these sets, but only on the number of sets to which each object belongs. This is the case g = 1 for the more general situation where the query answer just depends on the number of sets to which each collection of at near k objects belongs. Nosotros call such queries thousand-counting-simply. Hither, we focus on g-SyCALC, i.e., thousand-counting-only queries that are first-order definable. Every bit g-SyCALC is semantically divers, however, it is not surprising that it is already undecidable whether a first-society query is in 1-SyCALC. Therefore, we introduce SimpleCALC-k, a syntactically divers (strict) fragment of k-SyCALC. It turns out that many practical queries in m-SyCALC tin can already be expressed in SimpleCALC-one thousand. We also ascertain the query language GCount-k, which expresses counting-merely queries directly by using generalized counting terms, and show that this linguistic communication is equivalent to SimpleCALC-k. We testify that the k-counting-just queries course a not-collapsing hierarchy: for every k, there be (k+one)-counting-only queries that are not k-counting-only. This result specializes to both SimpleCALC-k and k-SyCALC. Finally, nosotros constitute a strong dichotomy between 1-SyCALC and SimpleCALC-k on the one hand and 2-SyCALC on the other hand past showing that satisfiability, validity, query containment, and query equivalence are decidable for the old two languages, but not for the latter one.

  7. JCSS 2019.
    Calculi for symmetric queries

    . Marc Gyssens, Jelle Hellings, Jan Paredaens, Dirk Van Gucht, Jef Wijsen, Yuqing Wu. (2019). In: Journal of Computer and System Sciences, 105, 54-86. Elsevier. DOI: x.1016/j.jcss.2019.04.003.

    author copy.
    Abstract

    Symmetric queries are introduced as queries on a sequence of sets of objects the outcome of which does not depend on the order of the sets. An appropriate data model is proposed, and 2 query languages are introduced, QuineCALC and SyCALC. They are correlated with the symmetric Boolean functions of Quine, respectively symmetric relational functions. The one-time correlation yields an incidence-based normal form for QuineCALC queries. More generally, we suggest counting-just queries as those SyCALC queries the result of which just depends on incidence information, and characterize them every bit quantified Boolean combinations of QuineCALC queries. A normal class is proposed for them too. It is shown that, while it is undecidable whether a SyCALC query is counting-only, it is decidable whether a counting-only query is a QuineCALC query. Finally, some classical decidability problems are considered which are shown to exist undecidable for SyCALC, but decidable for QuineCALC and counting-simply queries.

  8. AMAI 2016.
    Implication and axiomatization of functional and abiding constraints

    . Jelle Hellings, Marc Gyssens, January Paredaens, Yuqing Wu. (2016). In: Annals of Mathematics and Bogus Intelligence, 76(3), 251-279. Springer. DOI: 10.1007/s10472-015-9473-vii. Come across also FoIKS 2014.

    author copy.
    Abstract

    Akhtar et al. introduced equality-generating constraints and functional constraints as a first step towards dependency-like integrity constraints for RDF data. Here, we focus on functional constraints. Since the usefulness of functional constraints is not express to the RDF information model, we study the functional constraints in the more full general setting of relations with arbitrary arity. Nosotros further innovate constant constraints and report the functional and abiding constraints combined.

    Our principal results are sound and complete axiomatizations for the functional and constant constraints, both separately and combined. These axiomatizations are derived using the chase algorithm for equality-generating constraints. For derivations of constant constraints, we prove how every hunt step tin be imitation by a divisional number of applications of inference rules. For derivations of functional constraints, we bear witness that the chase algorithm tin be normalized to a more specialized symmetry-preserving chase algorithm performing so-chosen symmetry-preserving steps. Nosotros and so testify how each symmetry-preserving step can be simulated past a bounded number of applications of inference rules. The axiomatization for functional constraints is in particular applicable to the RDF data model, solving a major open problem of Akhtar et al.

Conference Proceedings (peer-reviewed)

  1. EDBT 2021.
    Proof-of-Execution: Reaching consensus through mistake-tolerant speculation

    . Suyash Gupta, Sajjad Rahnama, Jelle Hellings, and Mohammad Sadoghi. (2021). In: Proceedings of the 24th International Conference on Extending Database Engineering science (EDBT), 301-312. OpenProceedings.org. DOI: ten.5441/002/edbt.2021.27.

    author re-create, video by Suyash Gupta.
    Abstruse

    Multi-party information management and blockchain systems require data sharing among participants. To provide resilient and consistent data sharing, transactions engines rely on Byzantine Mistake-Tolerant consensus (BFT), which enables operations during failures and malicious behavior. Unfortunately, existing BFT protocols are unsuitable for high-throughput applications due to their high computational costs, high communication costs, loftier customer latencies, and/or reliance on twin-paths and non-faulty clients.

    In this paper, we present the Proof-of-Execution consensus protocol (PoE) that alleviates these challenges. At the cadre of PoE are out-of-social club processing and speculative execution, which let PoE to execute transactions before consensus is reached among the replicas. With these techniques, PoE manages to reduce the costs of BFT in normal cases, while guaranteeing reliable consensus for clients in all cases. We envision the use of PoE in high-throughput multi-party data-management and blockchain systems. To validate this vision, we implement PoE in our efficient ResilientDB textile and extensively evaluate PoE against several state-of-the-fine art BFT protocols. Our evaluation showcases that PoE achieves upward-to-80% college throughputs than existing BFT protocols in the presence of failures.

  2. ICDE 2021.
    RCC: Resilient concurrent consensus for high-throughput secure transaction processing

    . Suyash Gupta, Jelle Hellings, and Mohammad Sadoghi. (2021). In: 2021 IEEE 37th International Briefing on Data Engineering (ICDE), 1392-1403. IEEE. DOI: ten.1109/ICDE51399.2021.00124.

    author copy, video by Suyash Gupta.
    Abstruse

    Recently, we saw the emergence of consensus-based database systems that hope resilience against failures, potent data provenance, and federated information management. Typically, these fully-replicated systems are operated on meridian of a primary-fill-in consensus protocol, which limits the throughput of these systems to the capabilities of a single replica (the chief).

    To push throughput beyond this unmarried-replica limit, nosotros propose concurrent consensus. In concurrent consensus, replicas independently propose transactions, thereby reducing the influence of any single replica on performance. To put this thought in exercise, nosotros propose our RCC image that can turn any primary-fill-in consensus protocol into a concurrent consensus protocol by running many consensus instances concurrently. RCC is designed with functioning in listen and requires minimal coordination betwixt instances. Furthermore, RCC also promises increased resilience confronting failures. We put the blueprint of RCC to the test by implementing it in ResilientDB, our loftier-performance resilient blockchain fabric, and comparison it with state-of-the-art principal-backup consensus protocols. Our experiments show that RCC achieves upward to 2.75 times higher throughput than other consensus protocols and can be scaled to 91 replicas.

  3. Fourth dimension 2020.
    Stab-Forests: Dynamic Data Structures for Efficient Temporal Query Processing

    . Jelle Hellings and Yuqing Wu. (2020). In: 27th International Symposium on Temporal Representation and Reasoning (Time 2020), eighteen:1-xviii:19. Schloss Dagstuhl. DOI: 10.4230/LIPIcs.Fourth dimension.2020.18.

    author copy, slides, project page.
    Video Presentation

    Abstract

    Many sources of data take temporal kickoff and stop attributes or are created in a time-ordered fashion. Hence, it is simply natural to consider joining datasets based on these temporal attributes. To do and then efficiently, several internal-memory temporal join algorithms take recently been proposed. Unfortunately, these bring together algorithms are designed to join entire datasets and cannot efficiently join skewed datasets in which only few events participate in the join result.

    To back up high-performance internal-memory temporal joins of skewed datasets, we suggest the skip-bring together algorithm, which operates on stab-forests. The stab-forest is a novel dynamic information structure for indexing temporal data that allows efficient updates when events are appended in a time-based order. Our stab-forests efficiently support non only traditional temporal stab-queries, but too more full general multi-stab-queries. We conducted an experimental evaluation to compare the skip-join algorithm with state-of-the-art techniques using real-world datasets. Nosotros observed that the skip-bring together algorithm outperforms other techniques by an social club of magnitude when joining skewed datasets and delivers comparable functioning to other techniques on non-skewed datasets.

  4. LSGDA 2020.
    Explaining results of path queries on graphs

    . Jelle Hellings. (2020). In: Software Foundations for Data Interoperability and Large Scale Graph Information Analytics, 84-98. Springer. DOI: 10.1007/978-3-030-61133-0_7.

    author copy, technical study, slides, projection page.
    Video Presentation

    Abstract

    Many graph query languages employ, at their core, path queries that yield node pairs that are connected past a path of involvement. For the end-user, such node pairs only give limited insight as to why this query upshot is obtained, equally the pair does not directly identify the underlying path of interest. To address this limitation of path queries, we propose the single-path semantics, which evaluates path queries to, for each node pair (thou,n), a single path from thou to n satisfying the conditions of the query. To put our proposal in practice, we provide an efficient algorithm for evaluating context-gratis path queries, a particular powerful blazon of path queries, using the single-path semantics. Additionally, we perform a short evaluation of our techniques that shows that the single-path semantics is practically feasible, fifty-fifty when query results grow big.

  5. ICDT 2020.
    Coordination-free Byzantine replication with minimal advice costs

    . Jelle Hellings and Mohammad Sadoghi. (2020). In: 23rd International Briefing on Database Theory (ICDT 2020), 17:one-17:20. Schloss Dagstuhl. DOI: 10.4230/LIPIcs.ICDT.2020.17.

    author copy, slides, video.
    Video Presentation

    Abstract

    Land-of-the-fine art fault-tolerant and federated data management systems rely on fully-replicated designs in which all participants have equivalent roles. Consequently, these systems have just express scalability and are sick-suited for high-performance data direction. As an culling, we propose a hierarchical design in which a Byzantine cluster manages data, while an arbitrary number of learners can reliable learn these updates and employ the corresponding data.

    To realize our blueprint, we propose the delayed-replication algorithm, an efficient solution to the Byzantine learner trouble that is primal to our blueprint. The delayed-replication algorithm is coordination-gratis, scalable, and has minimal communication toll for all participants involved. In doing so, the delayed-broadcast algorithm opens the door to new high-performance fault-tolerant and federated information management systems. To illustrate this, we testify that the delayed-replication algorithm is not only useful to support specialized learners, but can also be used to reduce the overall advice price of permissioned blockchains and to improve their storage scalability.

  6. DISC 2019a.
    Brief annunciation: The error-tolerant cluster-sending problem

    . Jelle Hellings and Mohammad Sadoghi. (2019). In: 33rd International Symposium on Distributed Computing (DISC 2019), 45:1-45:three. Schloss Dagstuhl. DOI: ten.4230/LIPIcs.DISC.2019.45.

    author copy, technical written report, slides.
    Abstract

    The development of fault-tolerant distributed systems that can tolerate Byzantine beliefs has traditionally been focused on consensus protocols, which support fully-replicated designs. For the development of more sophisticated high-operation Byzantine distributed systems, more specialized fault-tolerant advice primitives are necessary, nevertheless.

    In this brief announcement, we place the cluster-sending problem--the trouble of sending a message from 1 Byzantine cluster to another Byzantine cluster in a reliable way--as such an essential communication primitive. Nosotros non only formalize this fundamental problem, just also establish lower premises on the complexity of this problem under crash failures and Byzantine failures. Furthermore, we develop practical cluster-sending protocols that meet these lower premises and, hence, have optimal complication. As such, our piece of work provides a stiff foundation for the further exploration of novel designs that address challenges encountered in fault-tolerant distributed systems.

  7. DISC 2019b.
    Brief proclamation: Revisiting consensus protocols through wait-gratis parallelization

    . Suyash Gupta, Jelle Hellings, and Mohammad Sadoghi. (2019). In: 33rd International Symposium on Distributed Computing (DISC 2019), 44:1-44:3. Schloss Dagstuhl. DOI: 10.4230/LIPIcs.DISC.2019.44. Run across besides ICDE 2021.

    author re-create, technical study, slides.
    Abstract

    In this brief announcement, we propose a protocol-agnostic approach to improve the design of primary-backup consensus protocols. At the core of our approach is a novel wait-gratuitous design of running several instances of the underlying consensus protocol in parallel. To yield a high-functioning parallelized design, we present coordination-gratis techniques to club operations across parallel instances, bargain with example failures, and assign clients to specific instances. Consequently, the pattern we present is able to reduce the load on individual instances and primaries, while also reducing the adverse furnishings of whatever malicious replicas. Our design is fine-tuned such that the instances coordinated by non-faulty replicas are wait-free: they tin continuously make consensus decisions, contained of the beliefs of any other instances.

  8. FoIKS 2018a.
    The power of tarski'south relation algebra on trees

    . Jelle Hellings, Yuqing Wu, Marc Gyssens, and Dirk Van Gucht. (2018). In: Foundations of Information and Cognition Systems, 244-264. Springer. DOI: 10.1007/978-three-319-90050-6_14. See also JLAMP 2022, PhD thesis.

    author copy, slides, project folio.
    Abstract

    Fragments of Tarski's relation algebra grade the basis of many versatile graph and tree query languages including the regular path queries, XPath, and SPARQL. Surprisingly, yet, a systematic study of the relative expressive ability of relation algebra fragments on trees has not yet been undertaken. Our approach is to starting time from a bones fragment which but allows limerick and union. We then study how the expressive power of the query language changes if we add diversity, converse, projections, coprojections, intersections, and/or difference, both for path queries and Boolean queries. For path queries, nosotros found that calculation intersection and departure yields more expressive power for some fragments, while calculation ane of the other operators ever yields more than expressive ability. For Boolean queries, we obtain a similar picture for the relative expressive ability, except for a few fragments where adding converse or projection yields no more expressive power. One challenging problem remains open, however, for both path and Boolean queries: does adding difference yields more expressive ability to fragments containing at least multifariousness, coprojections, and intersection?

  9. FoIKS 2018b.
    Beginning-order definable counting-but queries

    . Jelle Hellings, Marc Gyssens, Dirk Van Gucht, and Yuqing Wu. (2018). In: Foundations of Information and Knowledge Systems, 225-243. Springer. DOI: x.1007/978-3-319-90050-6_13. See also AMAI 2019.

    writer copy, slides.
    Abstract

    For several applied queries on bags of sets of objects, the answer does non depend on the precise limerick of these sets, but merely on the number of sets to which each object belongs. This is the instance k=$ane$ for the more than general state of affairs where the query answer only depends on the number of sets to which each group of at most k objects belongs. We call such queries thousand-counting-merely. Here, we focus on k-SyCALC, k-counting-but queries that are offset-lodge definable. Every bit k-SyCALC is semantically defined, however, it is non surprising that information technology is already undecidable whether a starting time-lodge query is in 1-SyCALC. Therefore, we innovate SimpleCALC-m, a syntactically defined (strict) fragment of k-SyCALC. It turns out that many practical queries in k-SyCALC can already be expressed in SimpleCALC-k. We prove that the k-counting-but queries form a non-collapsing bureaucracy: for every 1000, at that place exist (k+1)-counting-only queries that are not k-counting-only. This result specializes to both SimpleCALC-k and k-SyCALC. Finally, nosotros establish a strong dichotomy between ane-SyCALC and SimpleCALC-k on the ane paw and ii-SyCALC on the other hand by showing that satisfiability, validity, query containment, and query equivalence are decidable for the old 2 languages, only non for the latter one.

  10. DBPL 2017.
    From relation algebra to semi-join algebra: An arroyo for graph query optimization

    . Jelle Hellings, Catherine L. Pilachowski, Dirk Van Gucht, Marc Gyssens, and Yuqing Wu. (2017). In: Proceedings of The 16th International Symposium on Database Programming Languages, 5:i-5:ten. ACM. DOI: 10.1145/3122831.3122833. See too CJ 2020, PhD thesis.

    author copy, slides.
    Abstruse

    Many graph query languages rely on the composition operator to navigate graphs and select nodes of interests, even though evaluating compositions of relations can exist costly. Often, this need for composition tin be reduced past rewriting towards queries that use semi-joins instead. In this fashion, the cost of evaluating queries can be significantly reduced.

    We study techniques to recognize and use such rewritings. Concretely, nosotros report the relationship between the expressive power of the relation algebras, that heavily rely on composition, and the semi-join algebras, that supervene upon the limerick operator in favor of the semi-join operators.

    As our main consequence, we show that each fragment of the relation algebras where intersection and/or divergence is only used on edges (and not on complex compositions) is expressively equivalent to a fragment of the semi-join algebras. This expressive equivalence holds for node queries that evaluate to sets of nodes. For practical relevance, nosotros showroom constructive steps for rewriting relation algebra queries to semi-join algebra queries, and evidence that these steps lead to only a well-bounded increase in the number of steps needed to evaluate the rewritten queries.

    In addition, on node-labeled graphs that are sibling-ordered trees, we establish new relationships among the expressive power of Regular XPath, Conditional XPath, FO-logic, and the semi-bring together algebra augmented with restricted fixpoint operators.

  11. DBPL 2015.
    Relative expressive power of downward fragments of navigational query languages on trees and bondage

    . Jelle Hellings, Marc Gyssens, Yuqing Wu, Dirk Van Gucht, January Van den Bussche, Stijn Vansummeren, and George H. Fifty. Fletcher. (2015). In: Proceedings of the 15th Symposium on Database Programming Languages, 59-68. ACM. DOI: 10.1145/2815072.2815081. See also IS 2020, PhD thesis.

    writer re-create, slides, projection page.
    Abstruse

    Motivated past the standing involvement in the tree information model, nosotros study the expressive power of downwardly fragments of navigational query languages on trees. The basic navigational query language we consider expresses queries by building binary relations from the edge relations and the identity relation, using composition and union. We report the effects on the expressive ability when we add transitive closure, projections, coprojections, intersection, and divergence. We study expressiveness at the level of boolean queries and path queries, on labeled and unlabeled trees, and on labeled and unlabeled chains. In all these cases, we are able to present the consummate Hasse diagram of relative expressiveness. In particular, we were able to make up one's mind, for each fragment of the navigational query languages that we study, whether it is closed under difference and intersection when applied on copse.

  12. ICDT 2014.
    Conjunctive context-free path queries

    . Jelle Hellings. (2014). In: Proceedings of the 17th International Conference on Database Theory (ICDT), 119-130. OpenProceedings.org. DOI: x.5441/002/icdt.2014.15.

    writer copy, slides.
    Abstract

    In graph query languages, regular expressions are commonly used to specify the labeling of paths. A natural stride in increasing the expressive ability of these query languages is replacing regular expressions by context-free grammars. With the Conjunctive Context-Free Path Queries (CCFPQ) we introduce such a linguistic communication based on the well-known Conjunctive Regular Path Queries (CRPQ).

    Start, nosotros bear witness that query evaluation of CCFPQ has polynomial fourth dimension data complexity. Secondly, we look at the generalization of regular expressions, as used in CRPQ, to regular relations and show how like generalizations can be applied to context-gratis grammars, every bit used in CCFPQ. Thirdly, we investigate the relations between the expressive power of CRPQ, CCFPQ, and their generalizations. In several cases we show that replacing regular expressions past context-complimentary grammars does increase expressive power. Finally, we expect at including context-free grammars in more than powerful logics than conjunctive queries. We do so by adding negation and provide expressivity relations between the obtained languages.

  13. FoIKS 2014.
    Implication and axiomatization of functional constraints on patterns with an application to the RDF data model

    . Jelle Hellings, Marc Gyssens, Jan Paredaens, and Yuqing Wu. (2014). In: Foundations of Data and Noesis Systems, 250-269. Springer. DOI: 10.1007/978-3-319-04939-7_12. See also AMAI 2016.

    author copy, slides.
    Abstract

    Akhtar et al. introduced equality-generating constraints and functional constraints as an initial step towards dependency-like integrity constraints for RDF data. Hither, we focus on functional constraints. The usefulness of functional constraints is non limited to the RDF data model. Therefore, nosotros study the functional constraints in the more full general setting of relations with arbitrary arity. We show that a hunt algorithm for functional constraints can be normalized to a more than specialized symmetry-preserving chase algorithm. This symmetry-preserving chase algorithm is after used to construct a sound and complete axiomatization for the functional constraints. This axiomatization is in particular applicable in the RDF data model, solving a major open trouble of Akhtar et al.

  14. ICDT 2013.
    Walk logic as a framework for path query languages on graph databases

    . Jelle Hellings, Bart Kuijpers, January Van den Bussche, and Xiaowang Zhang. (2013). In: Proceedings of the 16th International Conference on Database Theory, 117-128. ACM. DOI: x.1145/2448496.2448512.

    author copy, slides.
    Abstract

    Motivated by the current interest in languages for expressing path queries to graph databases, this paper proposes to investigate Walk Logic (WL): the extension of first-order logic on finite graphs with the possibility to explicitly quantify over walks. WL can serve as a unifying framework for path query languages. To back up this claim, WL is compared in expressive power with various established query languages for graphs, such every bit first-order logic extended with reachability; the monadic second-social club logic of graphs; hybrid computation tree logic; and regular path queries. WL also serves as a framework to investigate the post-obit natural questions: Is quantifying over walks more powerful than quantifying over paths (walks without repeating nodes) only? Is quantifying over infinite walks more powerful than quantifying over finite walks only? WL model checking is decidable, but determining the precise complexity remains an open up trouble.

  15. SIGMOD 2012.
    Efficient external-retention bisimulation on DAGs

    . Jelle Hellings, George H.50. Fletcher, and Herman Haverkort. (2012). In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, 553-564. ACM. DOI: 10.1145/2213836.2213899. See too DBDBD 2011, Main thesis.

    author copy, slides, project page.
    Abstract

    In this paper we innovate the starting time efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model information in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific workflows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a cardinal office. For example, grouping together bisimilar nodes in an XML information set is the kickoff step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, withal, only internal-memory bisimulation algorithms take been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, in that location is a practical need for an efficient approach to calculating bisimulation in external memory.

    Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |East|)), where |N| and |Due east| are the numbers of nodes and edges, resp., in the information graph and Sort(due north) is the number of accesses to external memory needed to sort an input of size north. Nosotros also report specializations of this algorithm to common variations of bisimulation for tree-structured XML information sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms tin can procedure such graphs efficiently even when very limited internal retention is available. The proposed algorithms are simple enough for practical implementation and utilise, and open the door for further written report of external-memory bisimulation algorithms. To this end, the total open-source C++ implementation has been fabricated freely available.

Tutorials, Demos, and Talks

  1. ConsensusDays 21.
    Efficient fault-tolerant cluster-sending: Reliable and efficient communication between Byzantine fault-tolerant clusters

    . Jelle Hellings and Mohammad Sadoghi. (2021).

    abstract, slides.
    Abstract

    Traditional resilient systems operate on fully-replicated fault-tolerant clusters, which limits their scalability and performance. I way to make the step towards resilient high-performance systems that tin can deal with huge workloads, is by enabling contained fault-tolerant clusters to efficiently communicate and cooperate with each other, equally this too enables the usage of high-performance techniques such as sharding and parallel processing.

    To enable such efficient communication, nosotros identify the cluster-sending problem: the problem of sending a message from one Byzantine cluster to another Byzantine cluster in a reliable fashion, an essential communication primitive. We non only formalize this central problem, merely also establish lower bounds on the complexity of this problem under crash failures and Byzantine failures. Furthermore, we develop practical cluster-sending protocols that encounter these lower premises and, hence, have optimal complexity. Finally, nosotros propose probabilistic cluster-sending techniques that only take an expected constant message complication, this independent of the size of the clusters involved. Depending on the robustness of the clusters involved, these probabilistic techniques require only two-to-4 message round-trips while supporting worst-case linear advice between clusters, which is optimal. As such, our work provides a strong foundation for the further development of resilient loftier-performance systems.

  2. VLDB 2020 Tutorial.
    Edifice Loftier Throughput Permissioned Blockchain Fabrics: Challenges and Opportunities

    . Suyash Gupta, Jelle Hellings, Sajjad Rahnama, and Mohammad Sadoghi. (2020). In: Proceedings of the VLDB Endowment, 13(12), 3441-3444. VLDB. DOI: x.14778/3415478.3415565.

    author re-create, slides, video of the entire tutorial.
    Video Presentation

    Abstract

    Since the introduction of Bitcoin--the showtime widespread awarding driven by blockchains--the interest in the design of blockchain-based applications has increased tremendously. At the cadre of these applications are consensus protocols that securely replicate client requests among all replicas, even if some replicas are Byzantine faulty. Unfortunately, these consensus protocols typically have depression throughput, and this lack of performance is often cited as the reason for the ho-hum wider adoption of blockchain technology. Consequently, many works focus on designing more than efficient consensus protocols to increment throughput of consensus.

    We believe that this focus on consensus protocols only explains part of the story. To investigate this conventionalities, nosotros enhance a simple question: Can a well-crafted system using a classical consensus protocol outperform systems using mod protocols? In this tutorial, we reply this question past diving deep into the design of blockchain systems. Farther, nosotros take an in-depth look at the theory backside consensus, which tin help users select the protocol that best-fits their requirements. Finally, we share our vision of high-throughput blockchain systems that operate at large scales.

  3. VLDB 2020 Demo.
    Scalable, resilient, and configurable permissioned blockchain cloth

    . Sajjad Rahnama, Suyash Gupta, Thamir M. Qadah, Jelle Hellings, and Mohammad Sadoghi. (2020). In: Proceedings of the VLDB Endowment, 13(12), 2893-2896. VLDB. DOI: 10.14778/3415478.3415502.

    writer copy, video by Sajjad Rahnama.
    Abstract

    With the advent of Bitcoin, the interest of the database community in blockchain systems has steadily grown. Many existing blockchain applications employ blockchains as a platform for monetary transactions, however. We deviate from this philosophy and present ResilientDB, which can serve in a suite of non-monetary data-processing blockchain applications. Our ResilientDB uses state-of-the-art technologies and includes a novel visualization that helps in monitoring the land of the blockchain application.

  4. DEBS 2020 Tutorial.
    Blockchain consensus unraveled: Virtues and limitations

    . Suyash Gupta, Jelle Hellings, Sajjad Rahnama, and Mohammad Sadoghi. (2020). In: Proceedings of the 14th ACM International Conference on Distributed and Issue-Based Systems, 218-221. ACM. DOI: ten.1145/3401025.3404099.

    author copy, slides, video.
    Abstract

    Since the introduction of Bitcoin--the beginning wide-spread awarding driven by blockchains--the interest of the public and private sector in blockchains has skyrocketed. At the core of this interest are the means in which blockchains can be used to meliorate data management, eastward.g., by enabling federated information direction via decentralization, resilience against failure and malicious actors via replication and consensus, and strong information provenance via a secured immutable ledger.

    In practice, high-performance blockchains for data management are unremarkably congenital in permissioned environments in which the participants are vetted and tin exist identified. In this setting, blockchains are typically powered by Byzantine fault-tolerant consensus protocols. These consensus protocols are used to provide full replication among all honest blockchain participants by enforcing an unique order of processing incoming requests among the participants.

    In this tutorial, we accept an in-depth wait at Byzantine fault-tolerant consensus. Commencement, we take a expect at the theory backside replicated computing and consensus. Then, we delve into how common consensus protocols operate. Finally, we take a await at electric current developments and briefly wait at our vision moving forwards.

  5. Reimagine 2020.
    An in-depth look of BFT consensus in blockchain: Challenges and opportunities

    . Suyash Gupta, Jelle Hellings, Sajjad Rahnama, and Mohammad Sadoghi. (2020).

    slides.
  6. FAB 2020.
    ResilientDB: Global scale resilient blockchain textile

    . Suyash Gupta, Sajjad Rahnama, Jelle Hellings and Mohammad Sadoghi. (2020). In: The third International Symposium on Foundations and Applications of Blockchain.

    abstruse, video by Suyash Gupta.
    Abstruse

    Contempo developments in blockchain technology have inspired innovative new designs in resilient distributed and database systems. At their core, these blockchain applications typically use Byzantine fault-tolerant consensus protocols to maintain a common land across all replicas, fifty-fifty if some replicas are faulty or malicious. Unfortunately, existing consensus protocols are not designed to deal with geo-scale deployments in which many replicas spread across a geographically large area participate in consensus.

    To address this, we present the Geo-Scale Byzantine Fault-Tolerant consensus protocol (GeoBFT). GeoBFT is designed for first-class scalability by using a topological-aware grouping of replicas in local clusters, past introducing parallelization of consensus at the local level, and by minimizing advice between clusters. To validate our vision of loftier-performance geo-scale resilient distributed systems, we implement GeoBFT in our efficient ResilientDB permissioned blockchain material. We prove that GeoBFT is not merely sound and provides great scalability, simply also outperforms state-of-the-fine art consensus protocols by a factor of six in geo-scale deployments.

  7. Middleware 2019.
    An in-depth wait of BFT consensus in blockchain: Challenges and opportunities

    . Suyash Gupta, Jelle Hellings, Sajjad Rahnama, and Mohammad Sadoghi. (2019). In: Proceedings of the 20th International Middleware Conference--Tutorials, 6-10. ACM. DOI: x.1145/3366625.3369437.

    writer copy, slides.
    Abstract

    Since the introduction of Bitcoin--the first wide-spread application driven by blockchains--the interest of the public and private sector in blockchains has skyrocketed. At the core of this interest are the ways in which blockchains tin can be used to improve data management, e.g., by enabling federated information management via decentralization, resilience against failure and malicious actors via replication and consensus, and strong data provenance via a secured immutable ledger.

    In practise, high-operation blockchains for data direction are usually built in permissioned environments in which the participants are vetted and can be identified. In this setting, blockchains are typically powered by Byzantine mistake-tolerant consensus protocols. These consensus protocols are used to provide total replication amidst all honest blockchain participants by enforcing an unique order of processing incoming requests among the participants.

    In this tutorial, we take an in-depth wait at Byzantine mistake-tolerant consensus. First, we take a look at the theory behind replicated computing and consensus. And then, we delve into how mutual consensus protocols operate. Finally, we have a look at current developments and briefly look at our vision moving forrad.

  8. HPTS 2019.
    Efficient transaction processing in Byzantine fault tolerant environments

    . Suyash Gupta, Jelle Hellings, Thamir Qadah, Sajjad Rahnama, and Mohammad Sadoghi. (2019). In: 18th International Workshop on Loftier Performance Transaction Systems.

    abstruse, video past Suyash Gupta.
  9. DBDBD 2016.
    Graph query optimization using semi-join rewritings

    . Jelle Hellings. (2016). In: The Dutch-Belgian DataBase Solar day. Encounter too CJ 2020, DBPL 2017, PhD thesis.

    abstract, slides.
  10. WOG 2013.
    Path querying on graph databases

    . Jelle Hellings. (2013). In: WOG (Wetenschappelijke Onderzoeksgemeenschap/Scientific Research Network) Meeting.

    long abstract, brusk abstract, slides.
  11. DBDBD 2011.
    Efficient external-retention bisimulation on DAGs

    . Jelle Hellings. (2011). In: The Dutch-Belgian DataBase Day. See also SIGMOD 2012, Master thesis.

    abstract, slides, project page.

PhD thesis and Master thesis

  1. PhD thesis.
    On tarski's relation algebra: Querying trees and bondage and the semi-bring together algebra

    . Jelle Hellings. (2018). Hasselt University and transnational University of Limburg. Adviser: Marc Gyssens. See also CJ 2020, IS 2020, FoIKS 2018a, DBPL 2017, DBDBD 2016, DBPL 2015.

    author copy, thesis, slides, project folio.
    Abstract

    Many practical query languages for graph data are based on fragments of Tarski's relation algebra which, optionally, is augmented with the Kleene-star operator. Examples include XPath, SPARQL, the RPQs, and GXPath. Because of this central office of (fragments of) the relation algebra, we report two aspects in more particular. Combined, these 2 studies requite a detailed picture of the expressive power of the fragments of the relation algebra. Moreover, our results provide several opportunities for the development of new techniques for the efficient evaluation of graph queries.

  2. Chief thesis.
    Bisimulation division and partition maintenance

    . Jelle Hellings. (2011). Eindhoven Academy of Technology. Adviser: George H. 50. Fletcher. See too SIGMOD 2012, DBDBD 2011.

    writer re-create, thesis, last slides, mid-term slides, poster, project page.
    Abstract

    The combination of graphs and node bisimulation is widely used within and outside of computer science. One case of this combination is amalgam indices for speeding up queries on XML documents. Thereby XML documents tin can be represented by copse and many alphabetize types for indexing XML documents utilize the notion of bisimulation. Thereby the notion of bisimulation is used to relate nodes that have equivalent behavior with respect to queries performed on the XML documents. By replacing these bisimilar nodes 1 can reduce the size of the XML document and as such speed up queries. The objective of this thesis is to develop techniques for constructing and maintaining bisimulation partitions. Thereby a bisimulation partition groups nodes based on bisimilarity. In this thesis we primarily focus on very large directed acyclic graphs. The results in this thesis can for instance be used to alphabetize very large XML documents.

stonemorry1960.blogspot.com

Source: https://jhellings.nl/

0 Response to "From Theory to Practice Efficient Join Query Evaluation in a Parallel Database System Paper Review"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel