4

I have a method that creates a list, results, and then appends to it a dict row which has keys (column names in my eventual dataframe) and values (row values in my eventual dataframe). Then I append the row and convert the results collection to a df at the end. For example:

results = []
row = {}
row['Name']="alice"
row['Age']=3
results.append(row)
..reset row, fill in values like above, and append more rows..

df = pd.DataFrame(results)

The issue I have is that the column names are alphbetized so the df looks like this:

df.head()
|Age |  Name  |
|3   | alice  |

Is there a way to specify the column names to be in the order I want ("Name", "Col")? In reality I have many more columns so I'd prefer not to have to do:

cols = df.columns
cols = cols[:1] + cols[0:1] 

and manually rearrange it. However, if I do so, does that just move around the column row or also the rows below? So, will "alice" and 3 in the row below also get moved around while moving the column as one would expect?

cs95
  • 379,657
  • 97
  • 704
  • 746
Anna
  • 379
  • 5
  • 16

3 Answers3

3

Specify a list of your desired ordering to the columns attribute, and DataFrame will reorder the columns when creating the DataFrame.

pd.DataFrame(results, columns=['Name', 'Age'])

    Name  Age
0  alice    3

Or, DataFrame.from_records does this as well.

pd.DataFrame.from_records(results, columns=['Name', 'Age'])

    Name  Age
0  alice    3

If you're working with multiple columns, you could always choose to do columns=['Name', 'Age', *(row.keys() - {'Name', 'Age'})] assuming you don't care about the ordering of the remaining columns.

I've written about constructing DataFrames from records in this post: Convert list of dictionaries to a pandas DataFrame


Another idea, fix the columns if the ordering is incorrect.

if df.columns.get_loc('Age') < df.columns.get_loc('Name'):
    df.insert(df.columns.get_loc('Age'), 'Name', df.pop('Name'))

This will insert Name before Age if it is originally inserted after.

cs95
  • 379,657
  • 97
  • 704
  • 746
1

IIUC:

df = pd.DataFrame(results,columns=list(row.keys()))
#from collections import OrderedDict :alternative
#columns=[i for i in OrderedDict.fromkeys(row.keys())]
print(df)

    Name  Age
0  alice    3
anky
  • 74,114
  • 11
  • 41
  • 70
  • Will the guarantee the desired ordering on python versions <3.6? – cs95 Jun 04 '19 at 16:28
  • 1
    @cs95 good point, not tested for `<3.6` , works for my version. :) how about `[i for i in OrderedDict.fromkeys(row.keys())]` ?? – anky Jun 04 '19 at 16:29
0

According to the docs, as of pandas 0.23 and python 3.6, when constructing from dicts, the order will be maintained. You could simply use an OrderedDict.

import pandas as pd
from collections import OrderedDict


results = []
row = OrderedDict()

row['Name']="alice"
row['Age']=3
results.append(row)
#..reset row, fill in values like above, and append more rows..

df = pd.DataFrame(results)
print(df.head())

    Name  Age
0  alice    3
Josh
  • 151
  • 1
  • 6
  • This is only under the assumption that they are creating the records from scratch (this usually isn't the case, but still a valid suggestion). – cs95 Jun 04 '19 at 16:33