read a csv to df with a different columns order

Question

I have a large CSV file that is created through someone else's script with output columns named 'D', 'B', 'A', 'C'.
Is there a way to read it from CSV directly as 'A', 'B', 'C', 'D'.

Possible duplicate of [How to change the order of DataFrame columns?](https://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns) — It_is_Chris, Jan 17 '19 at 13:38
Possible duplicate of [Keeping columns in the specified order when using UseCols in Pandas Read\_CSV](https://stackoverflow.com/questions/40024406/keeping-columns-in-the-specified-order-when-using-usecols-in-pandas-read-csv) — meW, Jan 17 '19 at 13:39

score 2 · Answer 1 · answered Sep 06 '19 at 18:32

I found this solution that works with pandas 0.25.1

CSV file with output columns named 'D', 'B', 'A', 'C' Read it from CSV directly as 'A', 'B', 'C', 'D'

import pandas as pd
csv = pd.read_csv(data, index_col=['A', 'B', 'C', 'D'], usecols=['A', 'B', 'C', 'D'])

It reads the CSV columns in the given order into a pandas DataFrame named 'csv' Both the index_col=[] and the usecols=[] have to containe the same list order.

score 1 · Answer 2 · answered Jan 17 '19 at 13:37

I think read_csv cannot do it, idea with usecols also not working:

import pandas as pd

temp=u"""D;B;A;C
0;a;4;7;1
1;b;5;8;3
2;c;4;9;5
3;d;5;4;7
4;e;5;2;1
5;f;4;3;0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", usecols=['A', 'B', 'C', 'D'])
print (df)
   D  B  A  C
0  0  a  4  7
1  1  b  5  8
2  2  c  4  9
3  3  d  5  4
4  4  e  5  2
5  5  f  4  3

Solution is change order by list or sorting index:

df = df[['A', 'B', 'C', 'D']]
print (df)
   A  B  C  D
0  4  a  7  0
1  5  b  8  1
2  4  c  9  2
3  5  d  4  3
4  5  e  2  4
5  4  f  3  5

df = df.sort_index(axis=1)
print (df)
   A  B  C  D
0  4  a  7  0
1  5  b  8  1
2  4  c  9  2
3  5  d  4  3
4  5  e  2  4
5  4  f  3  5

Josh Friedlander · Answer 3 · 2019-01-17T14:11:42.557

I assume you're not asking how to change column order in Pandas, but would like to change it before reading in, possibly to reduce memory use. (If not see @jezrael's answer.)

Best is to do it with csv before reading into Pandas. Example:

import csv

with open('file.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
    fieldnames = ['A', 'B', 'C', 'D']
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()
    for row in csv.DictReader(infile):
        writer.writerow(row)

Taken from here

In terms of time, quickest is doing it as CSV (I tested with a very small csv file):

%%time
df = pd.read_csv('r.csv')[['A', 'B', 'C', 'D']]

CPU times: user 3.53 ms, sys: 1.48 ms, total: 5 ms
Wall time: 3.79 ms

%%time
df=pd.read_csv('r.csv')[df.columns[::-1]]

CPU times: user 4.75 ms, sys: 2.01 ms, total: 6.76 ms
Wall time: 5.13 ms

%%time
cols=list('ABCD')
pd.concat([pd.read_csv('r.csv', usecols=[x]) for x in cols], axis=1)

CPU times: user 8.98 ms, sys: 2.25 ms, total: 11.2 ms
Wall time: 9.88 ms

%%time
df = pd.read_csv('r.csv').sort_index(axis=1)

CPU times: user 4.21 ms, sys: 1.88 ms, total: 6.09 ms
Wall time: 4.65 ms

%%time
with open('r.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:

    fieldnames = ['A', 'B', 'C', 'D']
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()
    for row in csv.DictReader(infile):
        writer.writerow(row)
df = pd.read_csv('reordered.csv')

CPU times: user 3.29 ms, sys: 1.91 ms, total: 5.2 ms
Wall time: 3.83 ms

I do not care about that file itself, it will be deleted soon. I just want to read it in the order i need. — Zusman, Jan 17 '19 at 13:56

score 1 · Answer 4 · answered Jan 17 '19 at 13:56

One method would be to read a single column at a time and store in a list comprehension and concat the result:

In[121]:
t="""D,B,A,C
4,2,1,3"""
cols = list('ABCD')
pd.concat([pd.read_csv(io.StringIO(t), usecols=[x]) for x in cols], axis=1)

Out[121]: 
   A  B  C  D
0  1  2  3  4

However, I think that it's easier to just sort it by sub-selecting the columns in the order you want after reading normally:

df = df[['A','B','C','D']]

score 1 · Answer 5 · edited Sep 06 '19 at 20:57

1

Add .reset_index() in order to reset the in the index_col set index.

import pandas as pd
csv = pd.read_csv(data, index_col=['A', 'B', 'C', 'D'], usecols=['A', 'B', 'C', 'D']).reset_index()

edited Sep 06 '19 at 20:57

bagerard

5,681
3
24
48

answered Sep 06 '19 at 19:50

Joe

11
2

score 0 · Answer 6 · answered Jan 17 '19 at 13:40

0

Try pandas:

import pandas as pd
df=pd.read_csv(filename)
df=df[df.columns[::-1]]

answered Jan 17 '19 at 13:40

Vasyl Kushnir

66
5

read a csv to df with a different columns order

6 Answers6