1

Here is a code that I've written, which creates some increments of 3 variables to be used within p-value calculations, where the three variables are loc values or indicators or whatever the numbers mean:

i = 0
k = 2
j = 2

result = []
df = pd.DataFrame()

while j < data.shape[1]:
    tstat, data_stat = ttest_ind_from_stats(data.loc[i][k], data.loc[i + 1][k], data.loc[i + 2][k], data.loc[i][j],
                                        data.loc[i + 1][j], data.loc[i + 2][j])
    result.append([data_stat])
    j+=1
    if j == 8:
        j = 2
        i = i + 3
    if i == data.shape[0]:
        k = k + 1
        i = 0
        if k > 7:
            break

data_result = pd.DataFrame(result)

Where data.shape[0] = 150 and data.shape[1] = 8.

This code creates the correct p-values but as 1800 rows x 1 column dataframe. However, I would like to break the resulting df so that the code produces six different dataframes, each with data.shape[1]-2 number of columns (so 6 columns). With some example screenshots:

1) The data_result dataframe from my current code:

1
0.658
0.1067
0.777
0.459
0.3307
1
0.622
0.4178
0.3158
0.7674
0.7426

2) What I want:

col1    col2   col3    col4    col5    col6
1       0.658  0.1067  0.777   0.459   0.3307
1       0.622  0.4178  0.3158  0.7674  0.7426

There should be six of the above dataframes from the code.

3) I would then preferably add a column to the left of each dataframe, which would be used to insert the placeholder values for each row (screenshot omitted). This step is just optional.

So basically, I am dividing the resulting dataframe by every 6 rows, transpose them from single column to six columns, then repeat for the next six values, and so on. I thought maybe creating a Series or a new df until j = 8 then append to the overall df by row, but wasn't sure if this would work or be possible. Thanks!

edit)

so basically, I want to create six separate dataframes, each with 50 rows x 6 column shape. My current dataframe has 1800 rows x 1 column.

Bong Kyo Seo
  • 381
  • 2
  • 7
  • 18

2 Answers2

1

For the point2: You can try it with numpy:

import numpy as np
import pandas as pd

result_array= np.asarray(result)
# reshape for 150 rows and 6 columns
result_array.reshape(150,6)
#if number of row is undefined and 6 columns
#result_array.reshape(-1,6)

return pd.DataFrame(result_array)

For point 3, I'm not sure to get it, but from the data frame return you can do everything than pandas is allowing...

Renaud
  • 2,709
  • 2
  • 9
  • 24
1

This would get you the df you need (credit should go to Renaud)

a = np.array(df)
b= a.reshape(int(df.shape[0]/6),6)
df_new = pd.DataFrame(b)
df_new.columns =['col1','col2','col3','col4','col5','col6']
df_new

Output

   col1     col2    col3        col4    col5    col6
0   1.0     0.658   0.106743    0.7770  0.4590  0.3307
1   1.0     0.622   0.417800    0.3158  0.7674  0.7426
moys
  • 7,747
  • 2
  • 11
  • 42
  • for the a.reshape(x,y) parameter, what should x match? Right now, I'm getting the value error saying cannot reshape array of size 1800 into shape (25,6). I'm guessing the shapes are incorrect as my post only has a portion of the data. I want to create 6 dfs, each with shape (50 rows, 6 columns without counting the header). – Bong Kyo Seo Jan 20 '20 at 06:03
  • X should be the length of your dataframe divide by six (the number of columns you want). – moys Jan 20 '20 at 06:05
  • Well changing it to df.shape[0]*2 did create a 300 row x 6 column df but it's a single dataframe I'm afraid. Would it be possible to create six separate dfs, each with 50 rows x 6 column shape? – Bong Kyo Seo Jan 20 '20 at 06:05
  • 1
    See if this helps with splitting the dataframe into separate chunks https://stackoverflow.com/questions/25290757/split-pandas-dataframe-in-two-if-it-has-more-than-10-rows – moys Jan 20 '20 at 06:09