Questions tagged [sdv]

The Synthetic Data Vault (SDV) is a Python library that allows users to statistically model an entire multi-table, relational dataset and then generate synthetic versions of the dataset with similar statistical properties. It is developed by the DAI-Lab at LIDS, MIT, under the MIT License.

SDV - Synthetic Data Vault

License: MIT
Documentation: https://sdv-dev.github.io/SDV
Homepage: https://github.com/sdv-dev/SDV

Overview

The Synthetic Data Vault (SDV) is a tool that allows users to statistically model an entire multi-table, relational dataset. Users can then use the statistical model to generate a synthetic dataset. Synthetic data can be used to supplement, augment and in some cases replace real data when training machine learning models. Additionally, it enables the testing of machine learning or other data dependent software systems without the risk of exposure that comes with data disclosure. Underneath the hood it uses a unique hierarchical generative modeling and recursive sampling techniques.

15 questions

votes

2 answers

How to prepare data in the input format table and metadata for the Synthetic Data Vault (SDV) library

I want to use the synthetic data generation method of the Synthetic Data Vault (SDV) library (reference https://sdv.dev/SDV/index.html), but I can't. I think my problem is how to prepare data in the input format required for the method ".fit()". The…

asked Oct 11 '22 at 10:39

Davide Mariani

vote

0 answers

Problem with SDV library in Python: NameError: name 'load_tabular_demo' is not defined

This very short piece of code does not work with me. It seems to be working for many people since I found it posted in many places. from sdv.demo import get_available_demos demos = get_available_demos() data =…

python python-3.x sdv

asked Mar 04 '23 at 09:51

Vincent Granville

vote

0 answers

Is there a way of speeding up the .fit() method in the python's library SDV?

I am trying to synthesise a relational database made of the following tables: tables_info As you can see there are several tables with multiple columns and rows. When I fit the model on the whole dataset, it takes days to finish. I was wondering if…

python sdv

asked Oct 07 '22 at 14:28

ASE_tiger

vote

0 answers

Generating data via SDV GaussianCopula throws "numpy.linalg.LinAlgError: SVD did not converge" in Python

I am currently using SDV and GaussianCopula (https://sdv.dev/SDV/user_guides/single_table/gaussian_copula.html) to train my models. I have a given data set which is loaded for training. However, I get the following error message when creating the…

python numpy gaussian sdv

asked Sep 30 '21 at 05:16

41 72 6c

1,600
5
19
30

vote

1 answer

Trying to do SDV (Synthetic Data Vault) demo and getting error: TypeError: cannot astype a datetimelike from [datetime64[ns]] to [int32]

I'll start by saying I am NOT a Python developer. But I have a need for synthetic data and was trying to use the Synthetic Data Vault (https://github.com/sdv-dev/SDV). I have Python 3.7 installed (on Windows, I'm doing this right on my laptop for…

python pandas sdv

asked Feb 04 '20 at 17:19

Tom C

votes

0 answers

SDV CTGANSynthesizer generates AttributeError: 'NoneType' object has no attribute 'split' when trying to fit data

I am receiving an attribute error when trying to fit mock data using the CTGANSynthesizer from the SDV module. I am using SDV version 1.2.0 and Anaconda. from sdv.datasets.demo import download_demo from sdv.single_table import…

attributeerror sdv

asked Jul 10 '23 at 02:17

Data 803

votes

0 answers

How to force values in column be unique in SDV multi table HMASynthesizer?

I got this…

python sdv

asked Jun 22 '23 at 08:55

John Doe

votes

1 answer

How to add custom logic for relationship in SDV HMASynthesizer model?

I trying make synthetic data with SDV HMASynthesizer. But I got failing, because I need to add custom logic for relationships: mentor_id - user_id and mentee_id - user_id. That I need. If "user_id" in table "users" had role "mentor" it should be in…

python sdv

asked Jun 21 '23 at 19:00

John Doe

votes

1 answer

How to set sizes of synthetic_data dataframes for sdv multi_table (HMASynthesizer)?

I've simple (or not) question. How I can set num_rows for synthetic_data generated by HMASynthesizer? Tables: region_id address 0 r_0 Cohenville 1 r_1 Lake Martha 2 r_2 West Josephfurt 3 r_3 East…

python sdv

asked Jun 12 '23 at 15:14

John Doe

votes

0 answers

With SDV, I want to generate vertical oriented data (1 record is spread over multiple rows), is this possible?

I want to generate synthetic data with SDV where each row contains only one variable name (and variable value). Something like: PersonId, ValueName, Value, Index I get this data from a supplier. The dataSet per PersonId can have more than 300 fields…

row synthetic sdv

asked Jun 12 '23 at 14:06

user3820102

votes

2 answers

How to add Faker data type to SDV model (update metadata)

I'm trying to add Faker data type to SDV model. Imports: from sdv.metadata import SingleTableMetadata from sdv.single_table import GaussianCopulaSynthesizer import faker Code: fake = faker.Faker() metadata =…

python faker sdv

asked Jun 09 '23 at 17:32

John Doe

votes

0 answers

sdv.tabular module not loading

Trying to use sdv to create synthetic data. But the sdv.tabular module is not loaded into the notebook. Ran from sdv.tabular import GaussianCopula trying to import one of the models from the library but I am getting a 'No module named sdv.tabular'…

python sdv

asked May 09 '23 at 00:53

ohpzson

votes

0 answers

graphviz installed on conda but doesn't work

I installed graphviz on conda (Windows) because it is requested by the Metadata package of sdv to show a schema through the method visualize(), but it doesn't show anything (no errors). Also installed conda directly on Windows but with no…

python anaconda conda graphviz sdv

asked Nov 16 '22 at 10:37

Luigi Montaleone

votes

1 answer

can't install SDV in python

I am trying to install SDV package using pip install sdv , but without success. final part of the error log : error: legacy-install-failure × Encountered error while trying to install package. ╰─> scipy note: This is an issue with the package…

python python-3.x sdv

asked Sep 01 '22 at 21:46

DJRahim

votes

0 answers

Synthetic Data Vault doesn´t generate data (takes extreme long time)

I am new into ML and not really familiar with python. I want to extend a csv file using this script below. But it takes an extreme long time (even after half an hour) to generate 5000 rows of sample data. I cant find my failure. enter image…

python csv jupyter-notebook sample-data sdv

asked Dec 21 '21 at 13:04

anonym