Galileo-Logo dhs-logoepa-logo
edf
Overview | Documents | API | Software | Contact | HOME |  


Time-series data occurs in settings such as observations initiated by radars and satellites, checkpointing data representing state of the system at regular intervals, and analytics representing the evolution of extracted knowledge over time. Galileo is a demonstrably scalable distributed file system for managing such time-series data. Key capabilities in the storage framework include:
  • The ability to manage trillions of small files with quadrillion observations.
  • Support for over 20 scientific data formats netCDF, HDF, XML, CSV, GRIB, BUFR, DMSP, NEXRAD, and SIGMET.
  • Approximate queries, fuzzy queries, and probablistic queries
  • Hypothesis testing, significance evaluations, and kernel density estimations.
  • A scale-out architecture that enables the incremental assimilation of nodes in the system.
  • Accounting for spatiotemporal data characteristics.
  • Support for real-time, analytic queries over Petascale datasets.
  • Range geometry constrained queries over and proximity based relevance ranking over spatiotemporal datasets.
  • Supported queries can be point or continuous.
  • Support for a tunable replication framework
  • Support for journaling

     
Project News


Paper on scalable spatiotemporal analytics to appear in the IEEE Transactions on Big Data

Paper on Spatiotemporal Sketches appears in the IEEE Transactions on Knowledge & Data Engineering.

Paper on Ad Hoc Queries appears in the IEEE Transactions on Cloud Computing.

Paper on Anomaly Detection appears in Concurreny and Compuatation: Practice & Experience.

Paper on Approximate Queries appears in the IEEE Transactions on Knowledge & Data Engineering.


         

 


© The Galileo Project
Department of Computer Science
Colorado State University