Dremio is a data virtualization and query acceleration layer that interfaces standard BI tools with collections of relational, NoSql and cloud data sources, as well as various large-scale file systems. The product leverages technology from Apache Arrow[1] in the creation of so-called Data Reflections, which greatly accelerate queries without replicating their data.
The founders of the company, CEO Tomer Shiran and CTO Jaques Nadeau, both hail from MapR[2], and, in addition to leadership roles on the Apache Arrow project, both were heavily involved the Apache Drill[3] project.
Also read: Apache Arrow unifies in-memory Big Data systems
Also read: Startup Dremio emerges from stealth, launches memory-based BI query engine[4][5]
Arrow offers a unified format for representing columnar data in memory, allowing applications that support Arrow to share such data without it needing to be converted from one app's columnar format to a row store format, before being re-encoded in the other app's columnar representation.
Dremio[6] leverages technology from Apache Arrow in the creation of so-called Data Reflections, which greatly accelerate queries without replicating their data. Essentially, Reflections work a bit like indexes do in a relational database by providing a columnar summary of the data for aggregation analysis.
Reflect on this
In this new version of the product, Reflections are improved in several ways. First off, they can now recognize and optimize for data stored in star or snowflake schemas (wherein metrics are stored in a single "fact table" and drill down categories are stored in the own, related "dimension tables") in source data systems. This improvement allows Dremio to accelerate queries against a collection of such fact and dimension tables, related through joins, rather than only optimizing against individual tables. This