There are multiple issues with you code.
1. Using str
in place of the actual DataFrame variable
You are trying use .iloc
over a string dataframe1
for example. This won't work since what str
has no attribute .iloc
, as the error reads for you.
Since you want to work with DataFrame variable names, you may need to use eval()
to interpret the string as a variable name. NOTE: BE EXTRA CAREFUL while using eval()
. Please read the dangers of using eval() carefully.
2. Sampling 20 rows from each DataFrame.
If you are trying to get 20 rows by using for j1 in range(0, 20):
along with random.randint(100)
, there is a better way to avoid this iteration. Instead what you need is to use random.randint(0,100,(n,)
to get n random indexes. In this case random.randint(0,100,(20,)
Or an even better way to do this is just simply using df.sample(20)
to sample 20 rows from a given dataframe.
3. Forcing update over views of the dataframe
Its better to use a different appraoch than force an update over a view of the dataframe with Tdata[k:k+1,:] == ...
. Since you want to combine dataframes, its better to just collect them in a list and pass them to a pd.concat
which would be much more useful.
Here is sample code with a simple setting which should help guide you to what you are looking for.
import pandas as pd
import numpy as np
dataset0 = pd.DataFrame(np.random.random((100,3)))
dataset1 = pd.DataFrame(np.random.random((100,3)))
dataset2 = pd.DataFrame(np.random.random((100,3)))
dataset3 = pd.DataFrame(np.random.random((100,3)))
##Using random.randint
##samples = [eval('dataset'+str(i)).iloc[np.random.randint(0,100,(3,))] for i in range(4)]
##Using df.sample()
samples = [eval('dataset'+str(i)).sample(3) for i in range(4)]
##Change -
##1. The 3 to 20 for 20 samples per dataframe
##2. range(4) to range(200) to work with 200 dataframes
output = pd.concat(samples)
print(output)
0 1 2
42 0.372626 0.445972 0.030467
20 0.376201 0.445504 0.835735
56 0.214806 0.083550 0.582863
85 0.691495 0.346022 0.619638
24 0.290397 0.202795 0.704082
16 0.112986 0.013269 0.903917
51 0.521951 0.115386 0.632143
73 0.946870 0.531085 0.437418
98 0.745897 0.718701 0.280326
56 0.679253 0.010143 0.124667
4 0.028559 0.769682 0.737377
84 0.857553 0.866464 0.827472
4. Storing 200 dataframes??
Last but not the least, you should ask yourself, why are you storing 200 dataframe as individual variables, only to sample some rows from each.
Why not try to -
- Read each of the files iteratively
- Sample rows from each
- Store them in a list of dataframes
pd.concat
once you are done iterating over the 200 files
... instead of saving 200 dataframes and then doing the same.