The Synthetic Data Vault (SDV) is a Python library that allows users to statistically model an entire multi-table, relational dataset and then generate synthetic versions of the dataset with similar statistical properties. It is developed by the DAI-Lab at LIDS, MIT, under the MIT License.
SDV - Synthetic Data Vault
- License: MIT
- Documentation: https://sdv-dev.github.io/SDV
- Homepage: https://github.com/sdv-dev/SDV
Overview
The Synthetic Data Vault (SDV) is a tool that allows users to statistically model an entire multi-table, relational dataset. Users can then use the statistical model to generate a synthetic dataset. Synthetic data can be used to supplement, augment and in some cases replace real data when training machine learning models. Additionally, it enables the testing of machine learning or other data dependent software systems without the risk of exposure that comes with data disclosure. Underneath the hood it uses a unique hierarchical generative modeling and recursive sampling techniques.