Appending data to a dataframe but changing rows after certain # of columns
The above is my previous post, where I attempted to convert 1800 row x 1 column dataframe into 300 row x 6 column dataframe through:
i = 0
k = 2
j = 2
result = []
df = pd.DataFrame()
print(data.shape)
while j < data.shape[1]:
tstat, data_stat = ttest_ind_from_stats(data.loc[i][k], data.loc[i + 1][k], data.loc[i + 2][k], data.loc[i][j],
data.loc[i + 1][j], data.loc[i + 2][j])
result.append([data_stat])
#print(i, k, i, j)
#print(i + 1, k, i + 1, j)
#print(i + 2, k, i + 2, j)
j+=1
if j == data.shape[1]:
j = 2
i = i + 3
if i == data.shape[0]:
k = k + 1
i = 0
if k > data.shape[1]-1:
break
data_result = pd.DataFrame(result)
a = np.array(data_result)
b = a.reshape(int(data.shape[0]*2),6)
data_result_new = pd.DataFrame(b)
data_result_new.columns = ['col1','col2','col3','col4','col5','col6']
I would then would like to further split the dataframe into six chunks. I was thinking about using np split like:
c = np.array_split(b,6)
This line would be added right after b = a.reshape(int(data.shape[0]*2),6)
(I know the data_result_new
lines won't work if split is applied).
For example:
The starting data table would look like:
col1 col2 col3 col4 col5 col6
1 0.658 0.1067 0.777 0.459 0.3307
1 0.622 0.4178 0.3158 0.7674 0.7426
1 0.622 0.4178 0.3158 0.7674 0.7426
1 0.622 0.4178 0.3158 0.7674 0.7426
1 0.622 0.4178 0.3158 0.7674 0.7426
.
.
.
.
0.123 1 0.1222 0.111 0.123 0.1234
0.123 1 0.1222 0.111 0.123 0.1234
0.123 1 0.1222 0.111 0.123 0.1234
0.123 1 0.1222 0.111 0.123 0.1234
0.123 1 0.1222 0.111 0.123 0.1234
.
.
.
and so on (please note that the numbers are just random for this post, and for testing, you can use any floating numbers, these are essentially p-values). The rows are in groups of 50 rows and hence why I would like to separate the 300x6 df into 6 df of 50x6. Because of the data size, I wasn't able to insert all of it and had to express the table as above, but for the actual testing, you can probably generate random values with 300x6 shape df (not counting the headers).
what I want is:
[df1]
col1 col2 col3 col4 col5 col6
1 0.658 0.1067 0.777 0.459 0.3307
1 0.622 0.4178 0.3158 0.7674 0.7426
1 0.622 0.4178 0.3158 0.7674 0.7426
1 0.622 0.4178 0.3158 0.7674 0.7426
1 0.622 0.4178 0.3158 0.7674 0.7426
[df2]
col1 col2 col3 col4 col5 col6
0.123 1 0.1222 0.111 0.123 0.1234
0.123 1 0.1222 0.111 0.123 0.1234
0.123 1 0.1222 0.111 0.123 0.1234
0.123 1 0.1222 0.111 0.123 0.1234
0.123 1 0.1222 0.111 0.123 0.1234
and so on. I am not sure how I would iterate over each split from np.array_split
then save as separate dataframes. Any help or suggestions would be appreciated.