Galileo: Scalable Storage of Time-Series Multidimensional Data



Overview \| Documents \| API \| Software \| Contact \| HOME \|

Time-series data occurs in settings such as observations initiated by radars and satellites, checkpointing data representing state of the system at regular intervals, and analytics representing the evolution of extracted knowledge over time. Galileo is a demonstrably scalable distributed file system for managing such time-series data. Key capabilities in the storage framework include: The ability to manage trillions of small files with quadrillion observations. Support for over 20 scientific data formats netCDF, HDF, XML, CSV, GRIB, BUFR, DMSP, NEXRAD, and SIGMET. Approximate queries, fuzzy queries, and probablistic queries Hypothesis testing, significance evaluations, and kernel density estimations. A scale-out architecture that enables the incremental assimilation of nodes in the system. Accounting for spatiotemporal data characteristics. Support for real-time, analytic queries over Petascale datasets. Range geometry constrained queries over and proximity based relevance ranking over spatiotemporal datasets. Supported queries can be point or continuous. Support for a tunable replication framework Support for journaling				Project News Paper on scalable spatiotemporal analytics to appear in the IEEE Transactions on Big Data Paper on Spatiotemporal Sketches appears in the IEEE Transactions on Knowledge & Data Engineering. Paper on Ad Hoc Queries appears in the IEEE Transactions on Cloud Computing. Paper on Anomaly Detection appears in Concurreny and Compuatation: Practice & Experience. Paper on Approximate Queries appears in the IEEE Transactions on Knowledge & Data Engineering.