I want to create a dataframe which might have different number of rows for each column. In a for loop after some iterations, I am getting a value in the loop which has to be my column name for my output df, and the 'i' value at that instant has to be its row value.
Whenever a new column name occurs in the loop, it has to be added to my df. If column name already exists then the i value needs to be added to that row.
For representation purpose I have created a list_val
suppose:
data_df=pd.DataFrame()
list_val=[16,20,16,16,8,20,24,8,24,16]
for i in range(len(list_val)):
subset_df=pd.DataFrame([i],columns=[list_val[i]])
data_df=data_df.append(subset_df,sort=False)
print(data_df)
Output I am getting:
8 16 20 24
0 NaN 0 NaN NaN
0 NaN NaN 1 NaN
0 NaN 2 NaN NaN
0 NaN 3 NaN NaN
0 4 NaN NaN NaN
0 NaN NaN 5 NaN
0 NaN NaN NaN 6
0 7 NaN NaN NaN
0 NaN NaN NaN 8
0 NaN 9 NaN NaN
I don't want NaN values in between.
Expected Output:
8 16 20 24
0 4 0 1 6
0 7 2 5 8
0 NaN 3 NaN NaN
0 NaN 9 NaN NaN
0 NaN NaN NaN NaN
0 NaN NaN NaN NaN
0 NaN NaN NaN NaN
0 NaN NaN NaN NaN
0 NaN NaN NaN NaN
0 NaN NaN NaN NaN
Is there any way to replace the NaN's at the time of adding subset_df
or else the NaN values need to be replaced outside the loop. Or is there any other way to achieve this.
I will get 2 values in the loop, one has to be the column name, the other(i) has to be its row value.