2

I have 2 dataframes:

df_a ["user", "name", "zip", "city"]
df_b ["user", "gender", "country"]

I'm joining these 2 dataframes on user column-

final_df = pd.merge(df_a, df_b, on='user', how='left')
# column order --> ["user", "name", "zip", "city", "gender", "country"]

However, I want the columns to be in the following order-

["user", "name", "gender", "country", "zip", "city"]

Aside from the usual re-ordering like

my_ordering = ["user", "name", "gender", "country", "zip", "city"]
final_df = final_df[my_ordering]

..what's the best way to get this ordering I want (considering there could be 1000 such columns in a dataframe)?

kev
  • 2,741
  • 5
  • 22
  • 48
  • Have you tried using a reoder after the merge? https://stackoverflow.com/a/29922207/9101283 – Aryan Jain Dec 12 '19 at 23:51
  • @AryanJain not the best answer if you have a 1000 columns.. – kev Dec 13 '19 at 21:06
  • one use case I encountered for this involved the need to keep two groups of columns separate (one containing the target, timestamp, and ID, and the other - legitimate predictors). But never found a use case that called for preserving exact column indices... It's almost like expecting db records after a JOIN operation to preserve their sorting order... – mirekphd Apr 08 '20 at 19:36

4 Answers4

0

You could do:

column_list = list(final_df.columns)

#Now rearrange the list the way you want the columns to be
#Then do

final_df = final_df[column_list]
NotAName
  • 3,821
  • 2
  • 29
  • 44
  • Is there any rule that you use to arrange columns? If the way you want to arrange them doesn't follow any particular rule then the only way I see is to manually rearrange them. – NotAName Dec 13 '19 at 00:06
0

This is how I'm currently handling it for this case-

all_columns = df_a.values.tolist()

# Joining the dataframes
final_df = pd.merge(df_a, df_b, on='user', how='left')
# column order --> ["user", "name", "zip", "city", "gender", "country"]

# Re-arranging columns
for column in all_columns:
    if columns == "name":
        all_columns[all_columns.index(column) + 1:all_columns.index(column) + 1] = ["gender", "country"]
        break

# Assigning re-arranged columns
final_df = final_df[all_columns]
# column order --> ["user", "name", "gender", "country", "zip", "city"]

Would be nice to know if people have other efficient or shorter ways to do this column re-ordering.

kev
  • 2,741
  • 5
  • 22
  • 48
0

I will do like this:

If you have a dataframe df with 100 cols after merging.

Store the columns that you want to place beginning in a list.

first_cols =['a','z',b','c','g']

Add up the remaining columns to this list.

req_order = first_cols + [col for col in df.columns if col not in first_cols]

Then use that custom order :

df = df.reindex(columns = req_order)

You can also re-order like this:

df = df[req_order]

  • your answer is already present on here: https://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns/29922207. Also, if you wanted to move 40 columns in the beginning, you'd still have to type them all. – kev Dec 16 '19 at 16:44
0

As a workaround maybe use combination of set_index and reset_index?

final_df.set_index(["user", "name", "gender", "country"], inplace=True)
final_df = final_df.reset_index()

In my case I only have a few index columns on the left (id_foo, id_baa) and then a lot of data/year columns. I replace id_baa with id_qux and do not want it to be appended on the right side after all the year columns.

Stefan
  • 10,010
  • 7
  • 61
  • 117