0

I want to create a dataframe which might have different number of rows for each column. In a for loop after some iterations, I am getting a value in the loop which has to be my column name for my output df, and the 'i' value at that instant has to be its row value.

Whenever a new column name occurs in the loop, it has to be added to my df. If column name already exists then the i value needs to be added to that row.

For representation purpose I have created a list_val

suppose:

data_df=pd.DataFrame()

list_val=[16,20,16,16,8,20,24,8,24,16]

for i in range(len(list_val)):

    subset_df=pd.DataFrame([i],columns=[list_val[i]])
    data_df=data_df.append(subset_df,sort=False)

print(data_df)

Output I am getting:

    8    16   20   24
0  NaN  0    NaN  NaN
0  NaN  NaN  1    NaN
0  NaN  2    NaN  NaN
0  NaN  3    NaN  NaN
0  4    NaN  NaN  NaN
0  NaN  NaN  5    NaN
0  NaN  NaN  NaN  6
0  7    NaN  NaN  NaN
0  NaN  NaN  NaN  8
0  NaN  9    NaN  NaN

I don't want NaN values in between.

Expected Output:

   8    16   20   24
0  4    0    1    6  
0  7    2    5    8
0  NaN  3    NaN  NaN
0  NaN  9    NaN  NaN
0  NaN  NaN  NaN  NaN
0  NaN  NaN  NaN  NaN
0  NaN  NaN  NaN  NaN
0  NaN  NaN  NaN  NaN
0  NaN  NaN  NaN  NaN
0  NaN  NaN  NaN  NaN

Is there any way to replace the NaN's at the time of adding subset_df or else the NaN values need to be replaced outside the loop. Or is there any other way to achieve this. I will get 2 values in the loop, one has to be the column name, the other(i) has to be its row value.

axay
  • 437
  • 5
  • 19
  • Umm. This may be silly question, but if you disallow NaN:s then what do you want to have in places (cells) where there are no numbers? – pinegulf May 28 '20 at 09:26
  • If the NaN's need to be deleted in the end, the whatever values are below them needs to be in place of NaN values. All the NaN's need to be at the last. – axay May 28 '20 at 09:30
  • 2
    Does this answer your question? [How to move Nan values to end in all columns](https://stackoverflow.com/questions/52621834/how-to-move-nan-values-to-end-in-all-columns) – Riccardo Bucco May 28 '20 at 09:38
  • 1
    @RiccardoBucco Yes It did, ```justify``` worked perfectly. – axay May 28 '20 at 12:28

2 Answers2

2

Use justify with DataFrame constructor:

arr = justify(data_df.to_numpy(), invalid_val=np.nan,axis=0)

df = pd.DataFrame(arr, columns=data_df.columns, index=data_df.index)
print(df)
    8    16   20   24
0  4.0  0.0  1.0  6.0
0  7.0  2.0  5.0  8.0
0  NaN  3.0  NaN  NaN
0  NaN  9.0  NaN  NaN
0  NaN  NaN  NaN  NaN
0  NaN  NaN  NaN  NaN
0  NaN  NaN  NaN  NaN
0  NaN  NaN  NaN  NaN
0  NaN  NaN  NaN  NaN
0  NaN  NaN  NaN  NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • You already answer this question here https://stackoverflow.com/questions/52621834/how-to-move-nan-values-to-end-in-all-columns. It should be marked as duplicated... – Riccardo Bucco May 28 '20 at 09:38
  • can we use sort_values() on dataframe to solve the problem? – The Guy May 28 '20 at 09:46
1

This is not that pretty - but with help of numpy you can fairly easy get a numpy array with your desired result.

import numpy

def shifted_column(values):
    none_nan_values = values[ ~np.isnan(values) ]
    nan_row = np.zeros(values.shape)
    nan_row[:] = np.nan 
    nan_row[:none_nan_values.size] = none_nan_values

    return nan_row

np.apply_along_axis(shifted_column, 0, data_df.values)

You could convert it back to pandas as you wish

Willem Hendriks
  • 1,267
  • 2
  • 9
  • 15