0

I am using python and I have a dataframe named "noshow" with 5 columns such as ,

  1. Phone - float64
  2. Name - object
  3. Total Order Counts - int64
  4. Total Bill Value - float64
  5. List_of_Order_Id - object

5th column "List_of_Order_Id" has the list of order ids per row(customer) like below,

In [11]: noshow
Out[11]: 
           Mobile  ...                  List_of_Order_Id
0    9.163820e+08  ...                             21810
1    9.179049e+08  ...                             23387
2    9.183748e+08  ...                             21767
3    9.186110e+08  ...                             23457
4    9.187790e+08  ...                       23117,23163
..            ...  ...                               ...
353  9.970647e+09  ...                             21549
354  9.971940e+09  ...                             22753
355  9.994742e+09  ...     21505,21836,22291,22539,22734
356  9.994964e+09  ...                             22348
357  9.994997e+09  ...                       21100,21550

[358 rows x 5 columns]

Now, I want to automate the splitting of "List_of_Order_Id" column at each comma and create a new column for each of the value using loop or whatever solution available.

For example, if in one of the row of List_of_Order_Id column out of 358 rows has "n" order id values and that is the maximum number of order ids per customer then I want to split it into "n" columns and I want to name each of the columns as "Order_Id_1", "Order_Id_2", ......., "Order_Id_n" with the numbers as suffix like mentioned.

Kindly help! Thanks in advance.

Naveen V
  • 3
  • 2
  • Does this answer your question? [Pandas Dataframe: split column into multiple columns, right-align inconsistent cell entries](https://stackoverflow.com/questions/23317342/pandas-dataframe-split-column-into-multiple-columns-right-align-inconsistent-c) – Albin Paul Dec 19 '21 at 17:16
  • No @AlbinPaul In my case, I want to split the 5th column and I don't know how many columns I will be having once I split it and I also don't want to manually rename the new column names after splitting. Is there anyway that I could automate it using "Loop" or something to split it and name the newly split columns as "Order_Id_1", "Order_Id_2", ..............., "Order_Id_n" – Naveen V Dec 19 '21 at 17:23

1 Answers1

0

You can use str.split to split the strings in the column and then attach the resulting DataFrame to the original DataFrame, assigning column names using its width.

temp = df['List_of_Order_Id'].str.split(',', expand=True).applymap(lambda x: np.nan if x is None else x)
df[['Order_Id_'+str(i) for i in range(1,temp.shape[1] + 1)]] = temp

           Mobile  ...               List_of_Order_Id Order_Id_1 Order_Id_2  \
0    9.163820e+08  ...                          21810      21810        NaN   
1    9.179049e+08  ...                          23387      23387        NaN   
2    9.183748e+08  ...                          21767      21767        NaN   
3    9.186110e+08  ...                          23457      23457        NaN   
4    9.187790e+08  ...                    23117,23163      23117      23163   
..            ...  ...                            ...        ...        NaN   
353  9.970647e+09  ...                          21549      21549        NaN   
354  9.971940e+09  ...                          22753      22753        NaN   
355  9.994742e+09  ...  21505,21836,22291,22539,22734      21505      21836   
356  9.994964e+09  ...                          22348      22348        NaN   
357  9.994997e+09  ...                    21100,21550      21100      21550   

    Order_Id_3 Order_Id_4 Order_Id_5  
0          NaN        NaN        NaN  
1          NaN        NaN        NaN  
2          NaN        NaN        NaN  
3          NaN        NaN        NaN  
4          NaN        NaN        NaN  
..         NaN        NaN        NaN  
353        NaN        NaN        NaN  
354        NaN        NaN        NaN  
355      22291      22539      22734  
356        NaN        NaN        NaN  
357        NaN        NaN        NaN  
  • Thank you so much! Let's say I have a 6th column named "List_of_Bill_Value" which is in the same format as "List_of_Order_Id" and I am using the same code which you gave to split this one also and rename them as "Bill_Value_1", "Bill_Value_2", ..... "Bill_Value_n". Now, Is there there anyway like above that I could rearrange them automatically like "Order_Id_1", "Bill_Value_1", Order_Id_2", "Bill_Value_2", .......... "Order_Id_n", "Bill_Value_n". It would be really of a great help, please suggest me a solution. – Naveen V Dec 19 '21 at 18:01
  • @NaveenV You can rearrange them as `df = df[['Mobile']+[col for i in range(1, temp.shape[1]+1) for col in df.columns if str(i) in col]]` –  Dec 19 '21 at 18:09
  • After splitting, now my "df" has **m** columns such as "Mobile", 'Name', 'Total Order Counts', 'Total Bill Value', 'List_of_Order_Id', 'List_of_Bill_Value', **n** consecutive "Order_Id" columns and **n** consecutive "Bill_Value" columns. I used a variable "temp" for splitting "List_of_Order_Id" and I used a variable "temp1" for splitting "List_of_Bill_Value". As per your above code, at the place of "temp" which one (temp or temp1) should I put? And the above code is producing duplicate columns as well as the rearrangements are not proper. Please kindly suggest – Naveen V Dec 19 '21 at 18:55
  • You create `temp` and `temp1`, assign `df[['Order_Id_'+str(i) for i in range(1,temp.shape[1] + 1)]] = temp` and `df[['Bill_Value_'+str(i) for i in range(1,temp1.shape[1] + 1)]] = temp1`. Then rearrange columns as `df = df[["Mobile", 'Name', 'Total Order Counts', 'Total Bill Value']+[col for i in range(1, temp.shape[1]+1) for col in df.columns if str(i) in col]]`. –  Dec 19 '21 at 18:59
  • ` Column Non-Null Count Dtype --- ------ -------------- ----- 0 Mobile 1 Name 2 Total Order Counts 3 Total Bill Value 4 Order_Id_1 5 Order_Id_10 6 Order_Id_11 7 Order_Id_12 8 Order_Id_13 9 Order_Id_14 ... upto Order_id_19 15 Bill_Value_1 16 Bill_Value_10 17 Bill_Value_11 18 Bill_Value_12 19 Bill_Value_13 20 Bill_Value_14 ... upto Bill_Value_19 26 Order_Id_2 27 Order_Id_12 28 Bill_Value_2 29 Bill_Value_12 30 Order_Id_3 31 Order_Id_13 32 Bill_Value_3 33 Bill_Value_13 upto 19 – Naveen V Dec 19 '21 at 20:02
  • 58 Order_Id_10 59 Bill_Value_10 60 Order_Id_11 61 Bill_Value_11 62 Order_Id_12 63 Bill_Value_12 and goes fine – Naveen V Dec 19 '21 at 20:02
  • This is how I am getting the rearrange output, Please help! – Naveen V Dec 19 '21 at 20:03