1

I have a large CSV file that is created through someone else's script with output columns named 'D', 'B', 'A', 'C'.
Is there a way to read it from CSV directly as 'A', 'B', 'C', 'D'.

Zusman
  • 606
  • 1
  • 7
  • 31
  • Possible duplicate of [How to change the order of DataFrame columns?](https://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns) – It_is_Chris Jan 17 '19 at 13:38
  • Possible duplicate of [Keeping columns in the specified order when using UseCols in Pandas Read\_CSV](https://stackoverflow.com/questions/40024406/keeping-columns-in-the-specified-order-when-using-usecols-in-pandas-read-csv) – meW Jan 17 '19 at 13:39

6 Answers6

2

I found this solution that works with pandas 0.25.1

CSV file with output columns named 'D', 'B', 'A', 'C' Read it from CSV directly as 'A', 'B', 'C', 'D'

import pandas as pd
csv = pd.read_csv(data, index_col=['A', 'B', 'C', 'D'], usecols=['A', 'B', 'C', 'D'])

It reads the CSV columns in the given order into a pandas DataFrame named 'csv' Both the index_col=[] and the usecols=[] have to containe the same list order.

Joe
  • 51
  • 4
1

I think read_csv cannot do it, idea with usecols also not working:

import pandas as pd

temp=u"""D;B;A;C
0;a;4;7;1
1;b;5;8;3
2;c;4;9;5
3;d;5;4;7
4;e;5;2;1
5;f;4;3;0"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), sep=";", usecols=['A', 'B', 'C', 'D'])
print (df)
   D  B  A  C
0  0  a  4  7
1  1  b  5  8
2  2  c  4  9
3  3  d  5  4
4  4  e  5  2
5  5  f  4  3

Solution is change order by list or sorting index:

df = df[['A', 'B', 'C', 'D']]
print (df)
   A  B  C  D
0  4  a  7  0
1  5  b  8  1
2  4  c  9  2
3  5  d  4  3
4  5  e  2  4
5  4  f  3  5

df = df.sort_index(axis=1)
print (df)
   A  B  C  D
0  4  a  7  0
1  5  b  8  1
2  4  c  9  2
3  5  d  4  3
4  5  e  2  4
5  4  f  3  5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

I assume you're not asking how to change column order in Pandas, but would like to change it before reading in, possibly to reduce memory use. (If not see @jezrael's answer.)

Best is to do it with csv before reading into Pandas. Example:

import csv

with open('file.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:
    fieldnames = ['A', 'B', 'C', 'D']
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()
    for row in csv.DictReader(infile):
        writer.writerow(row)

Taken from here

In terms of time, quickest is doing it as CSV (I tested with a very small csv file):

%%time
df = pd.read_csv('r.csv')[['A', 'B', 'C', 'D']]

CPU times: user 3.53 ms, sys: 1.48 ms, total: 5 ms
Wall time: 3.79 ms

%%time
df=pd.read_csv('r.csv')[df.columns[::-1]]

CPU times: user 4.75 ms, sys: 2.01 ms, total: 6.76 ms
Wall time: 5.13 ms

%%time
cols=list('ABCD')
pd.concat([pd.read_csv('r.csv', usecols=[x]) for x in cols], axis=1)

CPU times: user 8.98 ms, sys: 2.25 ms, total: 11.2 ms
Wall time: 9.88 ms

%%time
df = pd.read_csv('r.csv').sort_index(axis=1)

CPU times: user 4.21 ms, sys: 1.88 ms, total: 6.09 ms
Wall time: 4.65 ms

%%time
with open('r.csv', 'r') as infile, open('reordered.csv', 'a') as outfile:

    fieldnames = ['A', 'B', 'C', 'D']
    writer = csv.DictWriter(outfile, fieldnames=fieldnames)
    writer.writeheader()
    for row in csv.DictReader(infile):
        writer.writerow(row)
df = pd.read_csv('reordered.csv')

CPU times: user 3.29 ms, sys: 1.91 ms, total: 5.2 ms
Wall time: 3.83 ms
Josh Friedlander
  • 10,870
  • 5
  • 35
  • 75
  • I do not care about that file itself, it will be deleted soon. I just want to read it in the order i need. – Zusman Jan 17 '19 at 13:56
1

One method would be to read a single column at a time and store in a list comprehension and concat the result:

In[121]:
t="""D,B,A,C
4,2,1,3"""
cols = list('ABCD')
pd.concat([pd.read_csv(io.StringIO(t), usecols=[x]) for x in cols], axis=1)

Out[121]: 
   A  B  C  D
0  1  2  3  4

However, I think that it's easier to just sort it by sub-selecting the columns in the order you want after reading normally:

df = df[['A','B','C','D']]
EdChum
  • 376,765
  • 198
  • 813
  • 562
1

Add .reset_index() in order to reset the in the index_col set index.

import pandas as pd
csv = pd.read_csv(data, index_col=['A', 'B', 'C', 'D'], usecols=['A', 'B', 'C', 'D']).reset_index() 
bagerard
  • 5,681
  • 3
  • 24
  • 48
Joe
  • 11
  • 2
0

Try pandas:

import pandas as pd
df=pd.read_csv(filename)
df=df[df.columns[::-1]]