Galileo-Logo dhs-logoepa-logo
edf
Overview | Documents | API | Software | Contact | HOME |  


Time-series data occurs in settings such as observations initiated by radars and satellites, checkpointing data representing state of the system at regular intervals, and analytics representing the evolution of extracted knowledge over time. Galileo is a demonstrably scalable storage framework for managing such time-series data. Key capabilities in the storage framework include:
  • The ability to manage billions of small files with trillions of observations.
  • Support for multiple scientific data formats such as netCDF, HDF, and the Defense Meteorological Satellite Program format.
  • Approximate queries, fuzzy queries, and probablistic queries
  • Hypothesis testing, significance evaluations, and kernel density estimations.
  • A scale-out architecture that enables the incremental assimilation of nodes in the system.
  • Accounting for spatiotemporal data characteristics.
  • Support for real-time, analytic queries over Petascale datasets.
  • Range geometry constrained queries over and proximity based relevance ranking over spatiotemporal datasets.
  • Supported queries can be point or continuous.
  • Support for a tunable replication framework

     
Project News


Paper on Superspreader Network Analysis over Epidemiology Data at IEEE BigData; acceptance rate 18.68% [10/12/2016]


The group's Lattice cluster is adding an additional Petabyte of data for testing/assessment of algorithms [10/1/2016]


Recent testing and deployments in a new 48-node Raspberry Pi Cluster. Experiments with 8 sensor types and 6 different flash memory drives [9/4/2016]


Paper on Ad Hoc Queries to appear in IEEE Transactions on Cloud Computing.


Paper on Approximate Queries appears in IEEE Transcations on Knowledge & Data Engineering.


         

 


© The Galileo Project
Department of Computer Science
Colorado State University