1

Unpredictably formatted df:

  First Name  number last_name
0    Cthulhu     666     Smith
    
df = pd.DataFrame({'First Name': ['Cthulhu'], 'number': [666], 'last_name': ['Smith']})

This needs to be put into column names and order: TemplateColumns = ['First Name', 'other', 'number']. If columns don't exist they can be created:

for col in TemplateColumns:
    if col not in df:
        df[col] = np.nan

Which gives:

  First Name  number last_name  other
0    Cthulhu     666     Smith    NaN

And initial columns need to be ordered the same as TemplateColumns, leaving the remaining columns at the end, to get desired_df:

  First Name  other   number last_name
0    Cthulhu    NaN      666     Smith

desired_df = pd.DataFrame({'First Name': ['Cthulhu'], 'other': [np.nan], 'number': [666], 'last_name': ['Smith']})

Reordering columns is well explained in other posts, but I don't know how to order the first n columns and keep the rest at the end. How can I do this?

3 Answers3

1

You can write your own function to achieve this. Essentially you can use .reindex() to reorder the dataframe while including empty columns if they don't exist. The only remaining part to figure out would be how to add the remaining columns not in TemplateColumns to your dataframe. You can do this by obtaining the set difference of the column index from the TemplateColumns then updating the order before your call to .reindex

Set up data & function

def reordered(df, new_order, include_remaining=True):
    cols_to_end = []
    if include_remaining:
        # gets the items in `df.columns` that are NOT in `new_order` 
        cols_to_end = df.columns.difference(new_order, sort=False)
    
    # Ensures that the new_order items are first
    final_order = new_order + list(cols_to_end)
    return df.reindex(columns=final_order)

df = pd.DataFrame({'First Name': ['Cthulhu'], 'number': [666], 'last_name': ['Smith']})
new_order = ['First Name', 'other', 'number']

with include_remaining:

out = reordered(df, new_order, include_remaining=True)

print(out)
  First Name  other  number last_name
0    Cthulhu    NaN     666     Smith

without include_remaining:

out = reordered(df, new_order, include_remaining=False)

print(out)
  First Name  other  number
0    Cthulhu    NaN     666
Cameron Riddell
  • 10,942
  • 9
  • 19
1

Try this

cols = TemplateColumns + df.columns.difference(TemplateColumns, sort=False).tolist()
df_final =  df.reindex(cols, axis=1)

Out[714]:
  First Name  other  number last_name
0    Cthulhu    NaN     666     Smith
Andy L.
  • 24,909
  • 4
  • 17
  • 29
0

Use insert like this:

for col in TemplateColumns:
    if col not in df:
        df.insert(1, col, np.nan)
U13-Forward
  • 69,221
  • 14
  • 89
  • 114