How to delete columns in a CSV file?

Question

I have been able to create a csv with python using the input from several users on this site and I wish to express my gratitude for your posts. I am now stumped and will post my first question.

My input.csv looks like this:

day,month,year,lat,long
01,04,2001,45.00,120.00
02,04,2003,44.00,118.00

I am trying to delete the "year" column and all of its entries. In total there is 40+ entries with a range of years from 1960-2010.

This is the type of problem where `awk` shines: `$ awk -F, 'BEGIN {OFS=","} {print $1,$2,$4,$5}' ex.csv` — Eric Wilson, Sep 28 '11 at 20:20
@Eric Wilson: Luckily, this CSV file has no quotes, allowing AWK to work. — S.Lott, Sep 29 '11 at 09:55
@S.Lott I agree, when the CSV format gets more complicated, Python's `csv` is the way to go. I only use `awk` when it clearly works, and is only one line. — Eric Wilson, Sep 29 '11 at 12:44

score 63 · Accepted Answer · edited Jan 02 '19 at 02:30

63

import csv
with open("source","rb") as source:
    rdr= csv.reader( source )
    with open("result","wb") as result:
        wtr= csv.writer( result )
        for r in rdr:
            wtr.writerow( (r[0], r[1], r[3], r[4]) )

BTW, the for loop can be removed, but not really simplified.

        in_iter= ( (r[0], r[1], r[3], r[4]) for r in rdr )
        wtr.writerows( in_iter )

Also, you can stick in a hyper-literal way to the requirements to delete a column. I find this to be a bad policy in general because it doesn't apply to removing more than one column. When you try to remove the second, you discover that the positions have all shifted and the resulting row isn't obvious. But for one column only, this works.

            del r[2]
            wtr.writerow( r )

edited Jan 02 '19 at 02:30

Ryan R

8,342
15
84
111

answered Sep 28 '11 at 21:08

S.Lott

384,516
81
508
779

1

This one worked nearly flawlessly, an error came up regarding the syntax. The colon should be deleted from wtr=csv.writer(result) Thanks for your input on this it has helped, it is also handy because it works on any number of columns I may need to delete. – Jeff Sep 29 '11 at 01:36
5

You can easily use your second method for multiple columns by deleting the highest column first, e.g. 'del r[8] del r[6] del r[2] wtr.writerow(r)' – Satvik Beri May 06 '13 at 23:50
You can save some writing for bigger CSV's by replacing `(r[0], r[1], r[3], r[4])` with something like `tuple(r[ii] for ii in range(len(r)) if ii != 2)` – srcerer Nov 06 '19 at 14:37
To delete more than 1 column in your last point, can't you just use the classic delete 'em backwards workaround? – bobobobo Jan 30 '21 at 07:56

score 51 · Answer 2 · answered Dec 24 '15 at 16:49

51

Use of Pandas module will be much easier.

import pandas as pd
f=pd.read_csv("test.csv")
keep_col = ['day','month','lat','long']
new_f = f[keep_col]
new_f.to_csv("newFile.csv", index=False)

And here is short explanation:

>>>f=pd.read_csv("test.csv")
>>> f
   day  month  year  lat  long
0    1      4  2001   45   120
1    2      4  2003   44   118
>>> keep_col = ['day','month','lat','long'] 
>>> f[keep_col]
    day  month  lat  long
0    1      4   45   120
1    2      4   44   118
>>>

answered Dec 24 '15 at 16:49

SunilThorat

1,672
2
13
15

1

This works even if your csv has line breaks in a string on the the row - many other linux commands like `cut` fail to remove columns and maintain the data integrity when a row's field has a line break as part of the content of the csv – technogeek1995 Dec 06 '18 at 14:45
1

In my case, the integer are get converted to float. – Gunarathinam Feb 01 '19 at 07:10
@Gunarathinam you can prevent this in newer pandas versions by passing `dtype=str` to `read_csv` – ntjess Jun 14 '21 at 23:02

score 6 · Answer 3 · answered Nov 16 '12 at 05:50

Using a dict to grab headings then looping through gets you what you need cleanly.

import csv
ct = 0
cols_i_want = {'cost' : -1, 'date' : -1}
with open("file1.csv","rb") as source:
    rdr = csv.reader( source )
    with open("result","wb") as result:
        wtr = csv.writer( result )
        for row in rdr:
            if ct == 0:
              cc = 0
              for col in row:
                for ciw in cols_i_want: 
                  if col == ciw:
                    cols_i_want[ciw] = cc
                cc += 1
            wtr.writerow( (row[cols_i_want['cost']], row[cols_i_want['date']]) )
            ct += 1

score 6 · Answer 4 · edited Aug 03 '23 at 07:20

6

I would use Pandas with col number

f = pd.read_csv("test.csv", usecols=[0,1,3,4])
f.to_csv("test.csv", index=False)

edited Aug 03 '23 at 07:20

Tom Solid

2,226
1
13
32

answered Apr 21 '20 at 16:03

dario

111
1
3

score 3 · Answer 5 · edited Mar 28 '16 at 13:17

3

You can directly delete the column with just

del variable_name['year']

edited Mar 28 '16 at 13:17

Tunaki

132,869
46
340
423

answered Mar 28 '16 at 13:16

ankur

2,039
2
10
12

Doesn't work for me. It says it requires an integer since it expects and index – ZekeC Jul 29 '22 at 16:47

score 2 · Answer 6 · answered Oct 04 '21 at 20:16

I will add yet another answer to this question. Since the OP did not say they needed to do it with Python, the fastest way to delete the column (specially when the input file has hundreds of thousands of lines), is by using awk.

This is the type of problem where awk shines:

$ awk -F, 'BEGIN {OFS=","} {print $1,$2,$4,$5}' input.csv

(feel free to append > output.csv to the command above if you need the output to be saved to a file)

Credit goes 100% to @eric-wilson who provided this awesome answer, as a comment on the original question, 10 years ago, almost without any credit.

aweis · Answer 7 · 2011-09-28T20:11:51.743

2

you can use the csv package to iterate over your csv file and output the columns that you want to another csv file.

The example below is not tested and should illustrate a solution:

import csv

file_name = 'C:\Temp\my_file.csv'
output_file = 'C:\Temp\new_file.csv'
csv_file = open(file_name, 'r')
## note that the index of the year column is excluded
column_indices = [0,1,3,4]
with open(output_file, 'w') as fh:
    reader = csv.reader(csv_file, delimiter=',')
    for row in reader:
       tmp_row = []
       for col_inx in column_indices:
           tmp_row.append(row[col_inx])
       fh.write(','.join(tmp_row))

edited Sep 28 '11 at 20:11

answered Sep 28 '11 at 20:06

aweis

5,350
4
30
46

2

Dispense with the the `tmp_row` and the `join` and use `csv.writer` and a generator expression: `for row in reader: wtr.writerow(row[i] for i in column_indices)`. It's safer (handles quoting automatically), more concise, and faster. – Steven Rumbalski Sep 28 '11 at 20:52
1

Why not use `csv` for writing, also? – S.Lott Sep 28 '11 at 21:08

score 2 · Answer 8 · answered Sep 28 '11 at 20:13

Off the top of my head, this will do it without any sort of error checking nor ability to configure anything. That is "left to the reader".

outFile = open( 'newFile', 'w' )
for line in open( 'oldFile' ):
   items = line.split( ',' )
   outFile.write( ','.join( items[:2] + items[ 3: ] ) )
outFile.close()

score 1 · Answer 9 · edited Apr 30 '19 at 03:17

1

Try:

result= data.drop('year', 1)
result.head(5)

edited Apr 30 '19 at 03:17

Achraf Almouloudi

756
10
27

answered Apr 30 '19 at 01:02

omega_mi

243
2
11

score 0 · Answer 10 · answered Mar 08 '22 at 20:29

Try python with pandas and exclude the column, you don't want to have:

import pandas as pd

# the ',' is the default separator, but if your file has another one, you have to define it with sep= parameter
df = pd.read_csv("input.csv", sep=',')
exclude_column = "year"
new_df = df.loc[:, df.columns != exclude_column]
# you can even save the result to the same file
new_df.to_csv("input.csv", index=False, sep=',')

score 0 · Answer 11 · answered Feb 10 '23 at 03:46

0

My take using pandas's drop in python:

import pandas as pd

df = pd.read_csv("old.csv")
new_df = df.drop("year", axis=1)
new_df.to_csv("new.csv", index=False)

answered Feb 10 '23 at 03:46

mhd

4,561
10
37
53

score 0 · Answer 12 · answered Sep 28 '11 at 20:10

It depends on how you store the parsed CSV, but generally you want the del operator.

If you have an array of dicts:

input = [ {'day':01, 'month':04, 'year':2001, ...}, ... ]
for E in input: del E['year']

If you have an array of arrays:

input = [ [01, 04, 2001, ...],
          [...],
          ...
        ]
for E in input: del E[2]

How to delete columns in a CSV file?

12 Answers12

Linked

Related