Publications

2021

Adaptive Indexing for In-situ Visual Exploration and Analytics

Stavros Maroulis, Nikos Bikakis, George Papastefanatos, Panos Vassiliadis, Yannis Vassiliou

Summary

In-situ processing has received a great deal of attention in recent years. In in-situ scenarios, big raw data files which do not fit in main memory, must be efficiently handled using commodity hardware, without the overhead of a preprocessing phase or the loading of data into a database. In this work, we present an adaptive indexing scheme that enables efficient visual exploration and analytics over big raw data files. Beyond visual exploration and statistics, the scheme enables categorical-based analytics using group-by and filter operations. The proposed scheme combines a tile-based structure that offers efficient exploratory operations over the 2D space, with a tree-based structure that organizes a tile’s objects based on their categorical values, enabling efficient visual analytics and the support of advanced visualization methods. The index resides in main memory and is built progressively as the user explores parts of the raw file, whereas its structure and level of granularity are adjusted to the user’s exploration areas and type of analysis. We conduct experiments using real and synthetic datasets, and demonstrate that the proposed approach, is in most cases more than 40× faster compared to the existing solutions, and performs around 3 orders of magnitude less I/O operations.

Publication

Stavros Maroulis, Nikos Bikakis, George Papastefanatos, Panos Vassiliadis, Yannis Vassiliou: Adaptive Indexing for In-situ Visual Exploration and Analytics. 23rd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP 2021) [Paper]

Presentation

[Slides] [Video]

Code

The source code of the paper is available at github.


RawVis: A System for Efficient In-situ Visual Analytics

Stavros Maroulis, Nikos Bikakis, George Papastefanatos, Panos Vassiliadis, Yannis Vassiliou

Summary

In-situ processing has received a great deal of attention in recent years. In in-situ scenarios, big raw data files which do not fit in main memory, must be efficiently handled on-the-fly using commodity hardware, without the overhead of a preprocessing phase or the loading of data into a database system. This paper presents RawVis, an open source data visualization system for in-situ visual exploration and analytics over big raw data. RawVis implements novel indexing schemes and adaptive processing techniques allowing users to perform efficient visual and analytics operations directly over the data files. RawVis provides real-time interaction, reporting low response time, over large data files, using commodity hardware.

Publication

Stavros Maroulis, Nikos Bikakis, George Papastefanatos, Panos Vassiliadis, Yannis Vassiliou: RawVis: A System for Efficient In-situ Visual Analytics, ACM International Conference on Management of Data (ACM SIGMOD/PODS 2021) [Paper]

System

[Online Tool] [Video]

Code

The source code of the paper is available at github.


Relational Schema Optimization for RDF-based Knowledge Graphs

George Papastefanatos, Marios Meimaris, Panos Vassiliadis

Summary

Characteristic sets (CS) organize RDF triples based on the set of properties associated with their subject nodes. This concept was recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most CS-based approaches yield significant improvements in space and query performance, they fail to perform well when answering complex query workloads in the presence of schema heterogeneity, i.e., when the number of CSs becomes very large, resulting in a highly partitioned data organization. In this paper, we address this problem by introducing a novel technique, for merging CSs based on their hierarchical structure. Our method employs a lattice to capture the hierarchical relationships between CSs, identifies dense CSs and merges dense CSs with their ancestors. We have implemented our algorithm on top of a relational backbone, where each merged CS is stored in a relational table, and therefore, CS merging results in a smaller number of required tables to host the source triples of a dataset. Moreover, we perform an extensive experimental study to evaluate the performance and impact of merging to the storage and querying of RDF datasets, indicating significant improvements. We also conduct a sensitivity analysis to identify the stability and any possible weaknesses of our algorithm, and report on our results.

Publication

George Papastefanatos, Marios Meimaris, Panos Vassiliadis. Relational Schema Optimization for RDF-based Knowledge Graphs. Information Systems Journal, Elsevier, 2021 [Paper]


2020

Linked Data Visualization: Techniques, Tools and Big Data

Laura Po, Nikos Bikakis, Federico Desimoni and George Papastefanatos

Summary

Linked Data (LD) is a well-established standard for publishing and managing structured information on the Web, gathering and bridging together knowledge from different scientific and commercial domains. The development of Linked Data Visualization techniques and tools has been adopted as the established practice for the analysis of this vast amount of information by data scientists, domain experts, business users, and citizens. This book covers a wide spectrum of visualization topics, providing an overview of the recent advances in this area, focusing on techniques, tools, and use cases of visualization and visual analysis of LD. It presents the core concepts related to data visualization and LD technologies, techniques employed for data visualization based on the characteristics of data, techniques for Big Data visualization, tools and use cases in the LD context, and, finally, a thorough assessment of the usability of these tools under different scenarios. The purpose of this book is to offer a complete guide to the evolution of LD visualization for interested readers from any background and empower them to get started with the visual analysis of such data. This book can serve as a course textbook or as a primer for all those interested in LD and data visualization.

Publication

Laura Po, Nikos Bikakis, Federico Desimoni and George Papastefanatos: Linked Data Visualization: Techniques, Tools and Big Data Morgan & Claypool publishers 2020 [Sample] [Publisher Site] [Homepage]


Hierarchical Property Set Merging for SPARQL Query Optimization [Best Paper]

Marios Meimaris, George Papastefanatos, Panos Vassiliadis

Summary

Characteristic sets (CS) organize RDF triples based on the set of properties associated with their subject nodes. This concept was recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most CS-based approaches yield significant improvements in space and query performance, they fail to perform well when answering complex query workloads in the presence of schema heterogeneity, i.e., when the number of CSs becomes very large, resulting in a highly partitioned data organization. In this paper, we address this problem by introducing a novel technique, for merging CSs based on their hierarchical structure. Our method employs a lattice to capture the hierarchical relationships between CSs, identifies dense CSs and merges dense CSs with their ancestors, thus reducing the size of the CSs as well as the links between them.We implemented our algorithm on top of a relational backbone, where each merged CS is stored in a relational table, and we performed an extensive experimental study to evaluate the performance and impact of merging to the storage and querying of RDF datasets, indicating significant improvements.

Publication

Marios Meimaris, George Papastefanatos, Panos Vassiliadis: Hierarchical Property Set Merging for SPARQL Query Optimization. 22nd International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP 2020) [Best Paper] [Paper]


LinkZoo: A collaborative resource management tool based on Linked Data

Giorgos Alexiou, Marios Meimaris, George Papastefanatos and Ioannis Anagnostopoulos

Summary

This article presents LinkZoo, a web-based, linked data enabled tool that supports collaborative management of information resources. LinkZoo addresses the modern needs of information-intensive collaboration environments to publish, manage, and share heterogeneous resources within user-driven contexts. Users create and manage diverse types of resources into common spaces such as files, web documents, people, datasets, and calendar events. They can interlink them, annotate them, and share them with other users, thus enabling collaborative editing, as well as enrich them with links to externally linked data resources. Resources are inherently modeled and published as resource description framework (RDF) and can be explicitly interlinked and dereferenced by external applications. LinkZoo supports creation of dynamic communities that enable web-based collaboration through resource sharing and annotating, exposing objects on the linked data Cloud under controlled vocabularies and permissions. The authors demonstrate the applicability of the tool on a popular collaboration use case scenario for sharing and organizing research resources.

Publication

Giorgos Alexiou, Marios Meimaris, George Papastefanatos and Ioannis Anagnostopoulos. LinkZoo: A collaborative resource management tool based on Linked Data. In International Journal of Web Information Systems, 2020 [Paper]


A Comparative Study of State-of-The-Art Linked Data Visualization Tools

Federico Desimoni, Nikos Bikakis, Laura Po and George Papastefanatos

Summary

Data visualization tools are of great importance for the exploration and the analysis of Linked Data (LD) datasets. Such tools allow users to get an overview, understand content, and discover interesting insights of a dataset. Visualization approaches vary according to the domain, the type of data, the task that the user is trying to perform, as well as the skills of the user. Thus, the study of the capabilities that each approach offers is crucial in supporting users to select the proper tool/technique based on their need. In this paper we present a comparative study of the state-of-the-art LD visualization tools over a list of fundamental use cases. First, we define 16 use cases that are representative in the setting of LD visual exploration, examining several tool's aspects; e.g., functionality capabilities, feature richness. Then, we evaluate these use cases over 10 LD visualization tools, examining: (1) if the tools have the required functionality for the tasks; and (2) if they allow the successful completion of the tasks over the DBpedia dataset. Finally, we discuss the insights derived from the evaluation, and we point out possible future directions.

Publication

Federico Desimoni, Nikos Bikakis, Laura Po and George Papastefanatos, A Comparative Study of State-of-The-Art Linked Data Visualization Tools, Visualization and Interaction for Ontologies and Linked Data International Workshop (VOILA 2020) [Paper]


Open Science Observatory: Monitoring Open Science in Europe

George Papastefanatos, Elli Papadopoulou, Marios Meimaris, Antonis Lempesis, Stefania Martziou, Paolo Manghi, Natalia Manola

Summary

Monitoring and evaluating Open Science (OS) practices and research output in a principled and continuous way is recognised as one of the necessary steps towards its wider adoption. This paper presents the Open Science Observatory, a prototype online platform which combines data gathered from OpenAIRE e-Infrastructure and other public data sources and informs users via rich visualizations on different OS indicators in Europe.

Publication

George Papastefanatos, Elli Papadopoulou, Marios Meimaris, Antonis Lempesis, Stefania Martziou, Paolo Manghi, Natalia Manola: Open Science Observatory: Monitoring Open Science in Europe. ADBIS/TPDL/EDA Workshops 2020 [Paper]


2019

Query driven Entity Resolution in Data Lakes

Giorgos Alexiou and George Papastefanatos

Summary

Entity Resolution (ER) constitutes a core task for data integration which aims at matching different representations of entities coming from various sources. Due to its quadratic complexity, it typically scales to large datasets through approximate, i.e., blocking methods: similar entities are clustered into blocks and pair-wise comparisons are executed only between co-occurring entities, at the cost of some missed matches. In traditional settings, it is a part of the data integration process, i.e., a preprocessing step prior to making “clean” data available to analysis. With the increasing demand of real-time analytical applications, recent research has begun to consider new approaches for integrating Entity Resolution with Query Processing. In this work, we explore the problem of query driven Entity Resolution and we propose a method for efficiently applying blocking and meta-blocking techniques during query processing. The aim of our approach is to effectively and efficiently answer SQL-like queries issued on top of dirty data. The experimental evaluation of the proposed solution demonstrates its significant advantages over the other techniques for the given problem settings.

Publication

Giorgos Alexiou and George Papastefanatos: Query driven Entity Resolution in Data Lakes. 13th International Workshop on Information Search, Integration, and Personalization, [Paper]


Merging RDF Characteristic Sets to Optimize SPARQL Queries

Marios Meimaris and George Papastefanatos

Publication

Marios Meimaris and George Papastefanatos: Merging RDF Characteristic Sets to Optimize SPARQL Queries. 13th International Workshop on Information Search, Integration, and Personalization, 2019 (Extended Abstract) [Presentation]


RawVis: Visual exploration over raw data

Nikos Bikakis, Stavros Maroulis, George Papastefanatos and Panos Vassiliadis

Summary

Data exploration and visual analytics systems are of great importance in Open Science scenarios, where less tech-savvy researchers wish to access and visually explore big raw data files (e.g., json, csv) generated by scientific experiments using commodity hardware and without being overwhelmed in the tedious processes of data loading, indexing and query optimization. In this work, we present our work for enabling efficient in site query processing on big raw data files for interactive visual exploration scenarios. We introduce a framework, named RawVis, built on top of a lightweight in-memory tile-based index, VALINOR, that is constructed on-the-fly given the first user query over a raw file and adapted incrementally based on the user interaction.We evaluate the performance of prototype implementation compared to three other alternatives and show that our method outperforms in terms of response time, disk accesses and memory consumption

Publication

Nikos Bikakis, Stavros Maroulis, George Papastefanatos and Panos Vassiliadis. RawVis: Visual exploration over raw data. 17th Hellenic Data Management Symposium, 2019 (Extended Abstract)


Merging RDF Characteristic Sets to Optimize SPARQL Queries

Marios Meimaris and George Papastefanatos

Summary

Data exploration and visual analytics systems are of great importance in Open Science scenarios, where less tech-savvy researchers wish to access and visually explore big raw data files (e.g., json, csv) generated by scientific experiments using commodity hardware and without being overwhelmed in the tedious processes of data loading, indexing and query optimization. In this work, we present our work for enabling efficient in site query processing on big raw data files for interactive visual exploration scenarios. We introduce a framework, named RawVis, built on top of a lightweight in-memory tile-based index, VALINOR, that is constructed on-the-fly given the first user query over a raw file and adapted incrementally based on the user interaction.We evaluate the performance of prototype implementation compared to three other alternatives and show that our method outperforms in terms of response time, disk accesses and memory consumption

Publication

Marios Meimaris and George Papastefanatos: Merging RDF Characteristic Sets to Optimize SPARQL Queries. 17th Hellenic Data Management Symposium, 2019 (Extended Abstract)