0

I want to delete the columns from a csv file that contain all zeros for example like the column f, g, h, k, l. The csv file in question is populated with the script so it is not possible to hardcode the columns. I would be really grateful if you could help with it.

File.csv
a,b,c,d,e,f,g,h,i,j,k,l
1,5,4,4,5,0,0,0,6,3,0,0
2,5,3,4,1,0,0,0,7,1,0,0
1,2,6,4,1,0,0,0,9,2,0,0
5,7,3,4,2,0,0,0,2,2,0,0
7,2,9,4,3,0,0,0,1,1,0,0

Resultant expected

File.csv
a,b,c,d,e,i,j
1,5,4,4,5,6,3
2,5,3,4,1,7,1
1,2,6,4,1,9,2
5,7,3,4,2,2,2
7,2,9,4,3,1,1
Zumizu
  • 13
  • 2
  • This will probably help you: [How do I delete a column that contains only zeros in Pandas?](https://stackoverflow.com/questions/21164910/how-do-i-delete-a-column-that-contains-only-zeros-in-pandas) – Kurt Kline Feb 09 '21 at 08:43
  • Does this answer your question? [How to drop columns which have same values in all rows via pandas or spark dataframe?](https://stackoverflow.com/questions/39658574/how-to-drop-columns-which-have-same-values-in-all-rows-via-pandas-or-spark-dataf) – techytushar Feb 09 '21 at 08:43
  • 3
    Does this answer your question? [How do I delete a column that contains only zeros in Pandas?](https://stackoverflow.com/questions/21164910/how-do-i-delete-a-column-that-contains-only-zeros-in-pandas) – buran Feb 09 '21 at 08:44
  • There is no [tag:python] in this question. – Peter Wood Feb 09 '21 at 08:45
  • 1
    There is no question in this question. – Peter Wood Feb 09 '21 at 08:45

1 Answers1

0

The following approach could be used with the csv library:

  1. Read the header in
  2. Read the rows in
  3. Transpose the list of rows into a list of columns (using zip)
  4. Use a set to drop all columns that only contain 0
  5. Write out the new header
  6. Write out the transposed list of columns as a list of rows.

For example:

import csv
    
with open('file.csv', newline='') as f_input:
    csv_input = csv.reader(f_input)
    header = next(csv_input)   # read header
    columns = zip(*list(csv_input))   # read rows and transpose to columns
    data = [(h, c) for h, c in zip(header, columns) if set(c) != set('0')]
    
with open('file2.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(h for h, c in data)   # write the new header
    csv_output.writerows(zip(*[c for h, c in data]))
Martin Evans
  • 45,791
  • 17
  • 81
  • 97
  • It does not work when there are four columns (together) containing all zeros. Column number (starting from 1) 19, 20, 21 and 22 contain all zeros and the script does not work for that. Can you please help with it. – Zumizu Feb 11 '21 at 03:54
  • Are you able to post link to a file with the problem? e.g. use something like pastebin – Martin Evans Feb 11 '21 at 06:37