4

I have two dataframes (df1 and df2, shown below), whose columns differ both in order and count. I need to append these two dataframes to an Excel file where the column order must be as specified in Col_list below.

df1 is:

 durable_medical_equipment    pcp  specialist  diagnostic  imaging  generic  formulary_brand  non_preferred_generic  emergency_room  inpatient_facility  medical_deductible_single  medical_deductible_family  maximum_out_of_pocket_limit_single  maximum_out_of_pocket_limit_family plan_name      pdf_name
0                      False  False       False       False    False    False            False                  False           False               False                      False                      False                               False                               False   ABCBCBC  adjnajdn.pdf

... and df2 is:

   pcp  specialist  generic  formulary_brand  emergency_room  urgent_care  inpatient_facility  durable_medical_equipment  medical_deductible_single  medical_deductible_family  maximum_out_of_pocket_limit_single  maximum_out_of_pocket_limit_family plan_name      pdf_name
0  True        True    False            False            True         True                True                       True                       True                       True                                True                                True   ABCBCBC  adjnajdn.pdf

I am creating a column list which is same as the order of columns in the excel.

Col_list = ['durable_medical_equipment', 'pcp', 'specialist', 'diagnostic',
            'imaging', 'generic', 'formulary_brand', 'non_preferred_generic',
            'emergency_room', 'inpatient_facility', 'medical_deductible_single',
            'medical_deductible_family', 'maximum_out_of_pocket_limit_single', 'maximum_out_of_pocket_limit_family',
            'urgent_care', 'plan_name', 'pdf_name']

I am trying to reorder my dataframe according to the Col_list using concat(). For the column values which are not present in the dataframe the value can be NaN.

result = pd.concat([df, pd.DataFrame(columns=list(Col_list))])

This is not working properly. How can I achieve this reordering?

I tried the following:

 result = pd.concat([df_repo, pd.DataFrame(columns=list(Col_list))], sort=False, ignore_index=True)
        print(result.to_string())

The output I am getting is:

 durable_medical_equipment    pcp specialist diagnostic imaging generic formulary_brand non_preferred_generic emergency_room inpatient_facility medical_deductible_single medical_deductible_family maximum_out_of_pocket_limit_single maximum_out_of_pocket_limit_family plan_name      pdf_name urgent_care
0                     False  False      False      False   False   False           False                 False          False              False                     False                     False                              False                              False   ABCBCBC  adjnajdn.pdf         NaN
    pcp specialist generic formulary_brand emergency_room urgent_care inpatient_facility durable_medical_equipment medical_deductible_single medical_deductible_family maximum_out_of_pocket_limit_single maximum_out_of_pocket_limit_family plan_name      pdf_name diagnostic imaging non_preferred_generic
0  True       True   False           False           True        True               True                      True                      True                      True                               True                               True   ABCBCBC  adjnajdn.pdf        NaN     NaN                   NaN
smci
  • 32,567
  • 20
  • 113
  • 146
user1896796
  • 731
  • 3
  • 9
  • 25
  • It seems a mistake to use `concat` rather than `merge` for a join, since your dataframes share a lot of common columns (`pcp, specialist, generic`). Do you really want those columns to show up twice in the output? – smci Aug 23 '19 at 11:40
  • With concat, it is not giving me duplicates – user1896796 Aug 23 '19 at 11:43
  • **When you want to combine 2+ dataframes having shared columns, use `merge` not `concat`**: [Difference(s) between merge() and concat() in pandas](https://stackoverflow.com/questions/38256104/differences-between-merge-and-concat-in-pandas) – smci Aug 23 '19 at 11:46

1 Answers1

0

Use if need change order by values in list add DataFrame.reindex and pass to concat:

df = pd.concat([df1.reindex(Col_list, axis=1), 
                df2.reindex(Col_list, axis=1)], sort=False, ignore_index=True)
print (df)
   durable_medical_equipment    pcp  specialist  diagnostic  imaging  generic  \
0                      False  False       False         0.0      0.0    False   
1                       True   True        True         NaN      NaN    False   

   formulary_brand  non_preferred_generic  emergency_room  inpatient_facility  \
0            False                    0.0           False               False   
1            False                    NaN            True                True   

   medical_deductible_single  medical_deductible_family  \
0                      False                      False   
1                       True                       True   

   maximum_out_of_pocket_limit_single  maximum_out_of_pocket_limit_family  \
0                               False                               False   
1                                True                                True   

   urgent_care plan_name      pdf_name  
0          NaN   ABCBCBC  adjnajdn.pdf  
1          1.0   ABCBCBC  adjnajdn.pdf  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • I tried `result = pd.concat([df_repo, pd.DataFrame(columns=list(Col_list))], sort=False, ignore_index=True)`. It is not giving me correct output. – user1896796 Aug 23 '19 at 11:29
  • I have updated my question with output I am getting, the order is still not the same. – user1896796 Aug 23 '19 at 11:33
  • Actually I need to change the order of the dataframe as per my defined list, because after that I am appending that df into my excel in a loop – user1896796 Aug 23 '19 at 11:35
  • @user1896796 - it was my second solution, first was removed now. – jezrael Aug 23 '19 at 11:37
  • Yeah, I am able to do this using reindex. I did something like below- `result = pd.concat([df_repo, pd.DataFrame(columns=list(Col_list))], sort=False, ignore_index=True) result = result.reindex(Col_list, axis=1)` – user1896796 Aug 23 '19 at 11:43
  • Surely **`merge` not `concat`**: [Difference(s) between merge() and concat() in pandas](https://stackoverflow.com/questions/38256104/differences-between-merge-and-concat-in-pandas) – smci Aug 23 '19 at 11:52
  • @smci - Hmmm, i think no reason for use `merge` here in need join 2 one row dataframes. – jezrael Aug 23 '19 at 11:54
  • @jezrael: the general case is not one-row dataframes, neither for the OP nor the general case. – smci Aug 23 '19 at 11:57
  • @smci - hmmm, if need many dataframes, `concat` here is much better like `merge`, maybe something missing – jezrael Aug 23 '19 at 11:58