Trident: Distributed Analytics Platform



Overview \| API \| Software \| Contact

Trident is a distributed analytics platform that addresses I/O and retrieval concerns for building machine learning models and performing analytics. Trident supports three key aspects of handling data in the context of analytic modeling: Distribution and storage Feature space management Support for ad hoc retrieval and exploration of model training data Incoming feature vectors are partitioned to facilitate targeted analysis over specific subsets of the feature space. Transformations supported by Trident include normalization, binning, and dimensionality reduction based on correlation analysis. Exploration and retrieval of model training data is enabled by expressive queries that can prune the feature space, sample across feature vectors, or combine portions of the data. Exposing this functionality at the storage level (rather than in a computation engine) allows many steps in the feature engineering process to be performed before analysis begins. By leveraging this functionality, researchers and practitioners can explore and inspect their datasets in an interactive fashion to help guide the creation of machine learning models or visualizations without needing to write ad-hoc applications or wait for heavyweight distributed computations to execute.				Project News Initial (beta) release posted!