I've got a pandas dataframe with almost a thousand columnds
The column titles are like
[smth,smth,smth,smth.....a,b,c,d,e]
how would I re arrange the columns to move A,B,C,D,E
to the start:
[a,b,c,d,e,smth,smth......]
I've got a pandas dataframe with almost a thousand columnds
The column titles are like
[smth,smth,smth,smth.....a,b,c,d,e]
how would I re arrange the columns to move A,B,C,D,E
to the start:
[a,b,c,d,e,smth,smth......]
A clean and efficient way is to use reindex
:
cols = list(df.columns)
df.reindex(columns=cols[-5:]+cols[:-5])
Example:
df = pd.DataFrame([], columns=['X', 'Y', 'Z', 'A', 'B', 'C', 'D', 'E'])
print(df)
cols = list(df.columns)
df = df.reindex(columns=cols[-5:]+cols[:-5])
print(df)
output:
Empty DataFrame
Columns: [X, Y, Z, A, B, C, D, E]
Index: []
Empty DataFrame
Columns: [A, B, C, D, E, X, Y, Z]
Index: []
If I were you I would just use the included .pop() method that is built in to pandas.
So in your case I would do something like this: You will end up with a dataFrame where the column the pop method was used on is now the first and it will subsequently shift all the rest.
first_column = df.pop('A')
You could continue to do this for each of the other columns and it would work well, and if you have so much data that it becomes cumbersome to do it this way you could just run a loop.
There is also some good info from pandas on this:
https://www.geeksforgeeks.org/how-to-move-a-column-to-first-position-in-pandas-dataframe/
if you simply want to get the last n columns to move to first .You could change the columns order and over write the data frame with that selection
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(100, 5)), columns=list('ABCDE'))
cols = list(df.columns)
cols_shift = 2
new_cols = []
new_cols.extend(cols[-cols_shift:])
new_cols.extend(cols[:-cols_shift])
new_cols
df = df[new_cols]
df
An explicit way is as follow. Note that it doesn't even matter where the columns that we want first actually are.
# example setup
cols = 'foo,bar,smth,smth_else,a,b,c,d,e'.split(',')
df = pd.DataFrame(np.random.randint(0,10, size=(4, len(cols))), columns=cols)
Then, say the columns you want first are ['a', 'b', 'c', 'd', 'e']
:
first = ['a', 'b', 'c', 'd', 'e']
out = df[first + [k for k in df.columns if k not in first]]
# or:
out = df[pd.Index(first).append(df.columns.difference(first))]
>>> out
a b c d e foo bar smth smth_else
0 7 1 0 2 5 7 2 1 9
1 6 7 7 4 3 7 3 1 2
2 8 9 3 6 2 0 1 5 8
3 1 2 0 3 3 2 4 1 4