Questions tagged [sdv]

The Synthetic Data Vault (SDV) is a Python library that allows users to statistically model an entire multi-table, relational dataset and then generate synthetic versions of the dataset with similar statistical properties. It is developed by the DAI-Lab at LIDS, MIT, under the MIT License.

SDV - Synthetic Data Vault

Overview

The Synthetic Data Vault (SDV) is a tool that allows users to statistically model an entire multi-table, relational dataset. Users can then use the statistical model to generate a synthetic dataset. Synthetic data can be used to supplement, augment and in some cases replace real data when training machine learning models. Additionally, it enables the testing of machine learning or other data dependent software systems without the risk of exposure that comes with data disclosure. Underneath the hood it uses a unique hierarchical generative modeling and recursive sampling techniques.

15 questions
3
votes
2 answers

How to prepare data in the input format table and metadata for the Synthetic Data Vault (SDV) library

I want to use the synthetic data generation method of the Synthetic Data Vault (SDV) library (reference https://sdv.dev/SDV/index.html), but I can't. I think my problem is how to prepare data in the input format required for the method ".fit()". The…
1
vote
0 answers

Problem with SDV library in Python: NameError: name 'load_tabular_demo' is not defined

This very short piece of code does not work with me. It seems to be working for many people since I found it posted in many places. from sdv.demo import get_available_demos demos = get_available_demos() data =…
1
vote
0 answers

Is there a way of speeding up the .fit() method in the python's library SDV?

I am trying to synthesise a relational database made of the following tables: tables_info As you can see there are several tables with multiple columns and rows. When I fit the model on the whole dataset, it takes days to finish. I was wondering if…
ASE_tiger
  • 11
  • 1
1
vote
0 answers

Generating data via SDV GaussianCopula throws "numpy.linalg.LinAlgError: SVD did not converge" in Python

I am currently using SDV and GaussianCopula (https://sdv.dev/SDV/user_guides/single_table/gaussian_copula.html) to train my models. I have a given data set which is loaded for training. However, I get the following error message when creating the…
41 72 6c
  • 1,600
  • 5
  • 19
  • 30
1
vote
1 answer

Trying to do SDV (Synthetic Data Vault) demo and getting error: TypeError: cannot astype a datetimelike from [datetime64[ns]] to [int32]

I'll start by saying I am NOT a Python developer. But I have a need for synthetic data and was trying to use the Synthetic Data Vault (https://github.com/sdv-dev/SDV). I have Python 3.7 installed (on Windows, I'm doing this right on my laptop for…
Tom C
  • 125
  • 3
  • 15
0
votes
0 answers

SDV CTGANSynthesizer generates AttributeError: 'NoneType' object has no attribute 'split' when trying to fit data

I am receiving an attribute error when trying to fit mock data using the CTGANSynthesizer from the SDV module. I am using SDV version 1.2.0 and Anaconda. from sdv.datasets.demo import download_demo from sdv.single_table import…
Data 803
  • 11
  • 2
0
votes
0 answers
0
votes
1 answer

How to add custom logic for relationship in SDV HMASynthesizer model?

I trying make synthetic data with SDV HMASynthesizer. But I got failing, because I need to add custom logic for relationships: mentor_id - user_id and mentee_id - user_id. That I need. If "user_id" in table "users" had role "mentor" it should be in…
John Doe
  • 95
  • 6
0
votes
1 answer

How to set sizes of synthetic_data dataframes for sdv multi_table (HMASynthesizer)?

I've simple (or not) question. How I can set num_rows for synthetic_data generated by HMASynthesizer? Tables: region_id address 0 r_0 Cohenville 1 r_1 Lake Martha 2 r_2 West Josephfurt 3 r_3 East…
John Doe
  • 95
  • 6
0
votes
0 answers

With SDV, I want to generate vertical oriented data (1 record is spread over multiple rows), is this possible?

I want to generate synthetic data with SDV where each row contains only one variable name (and variable value). Something like: PersonId, ValueName, Value, Index I get this data from a supplier. The dataSet per PersonId can have more than 300 fields…
0
votes
2 answers

How to add Faker data type to SDV model (update metadata)

I'm trying to add Faker data type to SDV model. Imports: from sdv.metadata import SingleTableMetadata from sdv.single_table import GaussianCopulaSynthesizer import faker Code: fake = faker.Faker() metadata =…
John Doe
  • 95
  • 6
0
votes
0 answers

sdv.tabular module not loading

Trying to use sdv to create synthetic data. But the sdv.tabular module is not loaded into the notebook. Ran from sdv.tabular import GaussianCopula trying to import one of the models from the library but I am getting a 'No module named sdv.tabular'…
0
votes
0 answers

graphviz installed on conda but doesn't work

I installed graphviz on conda (Windows) because it is requested by the Metadata package of sdv to show a schema through the method visualize(), but it doesn't show anything (no errors). Also installed conda directly on Windows but with no…
Luigi Montaleone
  • 53
  • 1
  • 1
  • 4
0
votes
1 answer

can't install SDV in python

I am trying to install SDV package using pip install sdv , but without success. final part of the error log : error: legacy-install-failure × Encountered error while trying to install package. ╰─> scipy note: This is an issue with the package…
DJRahim
  • 115
  • 1
  • 10
0
votes
0 answers

Synthetic Data Vault doesn´t generate data (takes extreme long time)

I am new into ML and not really familiar with python. I want to extend a csv file using this script below. But it takes an extreme long time (even after half an hour) to generate 5000 rows of sample data. I cant find my failure. enter image…
anonym
  • 1