I'm trying to prepare a dataset to use it as training data for a deep neural network. It consists of 13 .txt files, each between 500MB and 2 GB large. However, when trying to run a "data_prepare.py" file, I get the Value error of this post's title.
Reading answers from previous posts, I have loaded my data into R and checked both for NaN and infinite numbers, but the commands used tell me there appears to be nothing wrong with my data. I have done the following:
- I load my data as one single dataframe using magrittr, data.table and purrr packages(there are about 300 Million rows, all with 7 variables):
txt_fread <-
list.files(pattern="*.txt") %>%
map_df(~fread(.))
- I have used sapply to check for finite and NaN values:
>any(sapply(txt_fread, is.finite))
[1] TRUE
> any(sapply(txt_fread, is.nan))
[1] FALSE
I have also tried loading each data frame into a jupyter notebook and check individually for those values using the following commands:
file1= pd.read_csv("File_name_xyz_intensity_rgb.txt", sep=" ", header=None)
np.any(np.isnan(file1))
False
np.all(np.isfinite(file1))
True
And when I use print(file1.info()), this is what I get as info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22525176 entries, 0 to 22525175
Data columns (total 7 columns):
# Column Dtype
--- ------ -----
0 0 float64
1 1 float64
2 2 float64
3 3 int64
4 4 int64
5 5 int64
6 6 int64
dtypes: float64(3), int64(4)
memory usage: 1.2 GB
None
I know the file containing the code (data_prepare.py) works because it runs properly with a similar dataset. I therefore know it must be a problem with the new data I mention here, but I don't know what I have missed or done wrong while checking for NaNs and infinites. I have also tried reading and checking the .txt files individually, but it also hasn't helped much.
Any help is really appreciated!!
Btw: the R code with map_df came from a post by leerssej in How to import multiple .csv files at once?