How to delete columns containing all zeros from a csv file in python?

Question

I want to delete the columns from a csv file that contain all zeros for example like the column f, g, h, k, l. The csv file in question is populated with the script so it is not possible to hardcode the columns. I would be really grateful if you could help with it.

File.csv
a,b,c,d,e,f,g,h,i,j,k,l
1,5,4,4,5,0,0,0,6,3,0,0
2,5,3,4,1,0,0,0,7,1,0,0
1,2,6,4,1,0,0,0,9,2,0,0
5,7,3,4,2,0,0,0,2,2,0,0
7,2,9,4,3,0,0,0,1,1,0,0

Resultant expected

File.csv
a,b,c,d,e,i,j
1,5,4,4,5,6,3
2,5,3,4,1,7,1
1,2,6,4,1,9,2
5,7,3,4,2,2,2
7,2,9,4,3,1,1

This will probably help you: [How do I delete a column that contains only zeros in Pandas?](https://stackoverflow.com/questions/21164910/how-do-i-delete-a-column-that-contains-only-zeros-in-pandas) — Kurt Kline, Feb 09 '21 at 08:43
Does this answer your question? [How to drop columns which have same values in all rows via pandas or spark dataframe?](https://stackoverflow.com/questions/39658574/how-to-drop-columns-which-have-same-values-in-all-rows-via-pandas-or-spark-dataf) — techytushar, Feb 09 '21 at 08:43
Does this answer your question? [How do I delete a column that contains only zeros in Pandas?](https://stackoverflow.com/questions/21164910/how-do-i-delete-a-column-that-contains-only-zeros-in-pandas) — buran, Feb 09 '21 at 08:44

score 0 · Accepted Answer · answered Feb 09 '21 at 12:44

The following approach could be used with the csv library:

Read the header in
Read the rows in
Transpose the list of rows into a list of columns (using zip)
Use a set to drop all columns that only contain 0
Write out the new header
Write out the transposed list of columns as a list of rows.

For example:

import csv
    
with open('file.csv', newline='') as f_input:
    csv_input = csv.reader(f_input)
    header = next(csv_input)   # read header
    columns = zip(*list(csv_input))   # read rows and transpose to columns
    data = [(h, c) for h, c in zip(header, columns) if set(c) != set('0')]
    
with open('file2.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(h for h, c in data)   # write the new header
    csv_output.writerows(zip(*[c for h, c in data]))

It does not work when there are four columns (together) containing all zeros. Column number (starting from 1) 19, 20, 21 and 22 contain all zeros and the script does not work for that. Can you please help with it. — Zumizu, Feb 11 '21 at 03:54
Are you able to post link to a file with the problem? e.g. use something like pastebin — Martin Evans, Feb 11 '21 at 06:37

How to delete columns containing all zeros from a csv file in python?

1 Answers1