-1

How can one re-order a CSV without headers. Please note that the CSV file has around 100 thousands records.

Example of the CSV file:

1,$1,AA,GG,DD
2,$2,A1,FD,HH
3,$3,1A,PP,LL
4,$4,1G,LL,SS

Expected output:

DD,GG,1,$1,AA
HH,FD,2,$2,A1
LL,PP,3,$3,1A
SS,LL,4,$4,1G

Thank you all. PLEASE NOTE; Im using Windows

NorthAfrican
  • 135
  • 2
  • 10
  • 2
    I think [this](https://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns) might answer your question. Yeah one more thing, you can read a csv as a dataframe using `pd.read_csv()` and then rewrite back using `df.to_csv('filename')`. the documentation is [here](https://pandas.pydata.org/pandas-docs/version/0.23.1/generated/pandas.read_csv.html#pandas.read_csv) – Anurag Reddy Sep 21 '20 at 14:00
  • you can do that with `awk` – Walter Tross Sep 21 '20 at 14:01
  • @AnuragReddy im not sure if thats the right thing to do with so much records. Walter Tross, how is that done with awk? – NorthAfrican Sep 21 '20 at 14:01
  • 2
    `awk -F, '{ printf "%s,%s,%s,%s,%s\n", $5, $4, $1, $2, $3 }' filename.csv` – Walter Tross Sep 21 '20 at 14:05
  • @NorthAfrican, oh sorry I missed the statement showing 100k records. The solution by Walter Tross is very efficient then. – Anurag Reddy Sep 21 '20 at 14:08
  • @WalterTross You need to tell me what you did there lol.... – NorthAfrican Sep 21 '20 at 14:08
  • just basic usage, see `man awk`.`-F` is the field separator (the default is whitespace). Of course, if you had commas within quotes in your csv this wouldn't work. – Walter Tross Sep 21 '20 at 14:10
  • @WalterTross what if my file is tab seperated? – NorthAfrican Sep 21 '20 at 14:12
  • `awk -F'\t' '{ printf "%s\t%s\t%s\t%s\t%s\n", $5, $4, $1, $2, $3 }' filename.csv`, but you don't need the `-F` if you don't have spaces within fields. Furthermore I have tried it on my Mac with `zsh`, not sure about other envs. It shouls work, though, since the separator probably should not be interpreted by the shell, and be interpreted as a regex by `awk` – Walter Tross Sep 21 '20 at 14:19
  • @WalterTross if I copy paste your line I get `invalid syntax` – NorthAfrican Sep 21 '20 at 14:22
  • funny, can't help any further, today, sorry – Walter Tross Sep 21 '20 at 14:25
  • @WalterTross awk is not usable for Windows lol.... – NorthAfrican Sep 21 '20 at 14:29

2 Answers2

1

Can you try below program. This reads through the input file, split it with comma, reorder it and write into output file delimited with comma.

with open("input.csv","r") as f, open("output.csv","w") as f1:
    for line in f:
        cols=line.strip().split(",")
        f1.write(",".join([cols[4],cols[3]]+cols[:3])+"\n")

Output

DD,GG,1,$1,AA
HH,FD,2,$2,A1
LL,PP,3,$3,1A
SS,LL,4,$4,1G

Liju
  • 2,273
  • 3
  • 6
  • 21
1

I've played a bit with your question and here's what I came up with:

import csv
import time


traget_file = "1000000_Sales_Records.csv"
output_file = "swapped.csv"
new_order = [5, 4, 1, 2, 3, 6, 9, 7, 10, 8, 13, 11, 12]
delim = ","


def read_data(filename: str, delim: str):
    with open(filename) as csvfile:
        datareader = csv.reader(csvfile, delimiter=delim)
        next(datareader, None)  # skip header
        for row in datareader:
            yield row


def swapper(file, order):
    for row in read_data(file, delim):
        yield [row[index - 1] for index in order]


with open(output_file, "a") as nf:
    w = csv.writer(nf)
    s = time.time()
    for new_row in swapper(traget_file, new_order):
        w.writerow(new_row)
    e = time.time()
    print(f"Done in {round((e - s) / 60, 2)}")

I've tested this with a sample .csv file with 1000000 rows and it swapped the order in about 13 seconds. I've taken the file from here.

This works both with comma and tab delimited .csv files. Just change the "," to "\t" if you want to work with a tab delimited file.

baduker
  • 19,152
  • 9
  • 33
  • 56