1

I have a CSV file containing multiple columns(almost 100). How can I filter multiple columns at once using certain criteria in Python? To be more precise, many of the columns are of no use to me. How can the file be filtered?

PS: I am a beginner user.

nibar
  • 35
  • 7
  • 1
    Please always give a sample input and expected output. – Mayank Porwal Apr 11 '20 at 11:50
  • Does this answer your question? [How can I filter lines on load in Pandas read\_csv function?](https://stackoverflow.com/questions/13651117/how-can-i-filter-lines-on-load-in-pandas-read-csv-function) – Mayank Porwal Apr 11 '20 at 11:52
  • @MayankPorwal, the question you've send is about filtering lines and the question posted is concerning filtering columns – Henrique Branco Apr 11 '20 at 12:06

4 Answers4

2

Let's say you have following content in csv file

Col1,Col2, Col3
1,a,0
2,b,0
3,d,1

Read it in pandas dataframe using following script

import pandas as pd  

df=pd.read_csv(file)

To see the columns in dataframe df use

print(df.columns)

This will give you the column names in df in form of list, in this case ['col1', 'col2', 'col3']

To retain only specific columns (for example col1 and col3) you can use

df=df [ [ "col1","col3"] ]

Now if you print (df.columns) it will only have ['col1', 'col3']

Edited in reply to the comment:

If you want to delete the columns that fulfil certain condition you can use the following script

for column in df.columns:

    if 0 in df[column].values: # This will check if 0 is in values of column,  you can add any condition you want here

    print('Deleting column', column) # I assume you want to delete the column that fulfills the condition

    df=df.drop(columns=column) # This statement will delete the column fulfilling the condition
print("df after deleting columns:")
print(df)

It will print

Deleting column col3

df after deleting columns:

col1, col2

1,a

2,b

3,c

user3046442
  • 468
  • 5
  • 12
  • I have to check if col 1,col 2,...........,col X satisfy some conditions. How can I do that? – nibar Apr 11 '20 at 13:05
  • I have edited the answer, you can see the script how to check which columns satisfy some condition. Please vote up if it was helpful. Thank you – user3046442 Apr 14 '20 at 09:58
1

If you want to delete all the zero values from your dataframe column, you should follow following steps, (suppose you dataframe has name df)

  1. Replace all the zero values to nan first
import numpy as np
import pandas as pd

df = df.replace(0, np.nan)

  1. Drop the nan value using dropna method in pandas
df = df.dropna(axis=1, how='all')

The parameter axis=1 is for assigning drop rule for columnwise. And the how=all for checking all the values inside this column.

In this way, Single line answer is below

df = df.replace(0, np.nan).dropna(axis=1, how=all)
Tek Kshetri
  • 2,129
  • 1
  • 17
  • 41
0

You can parse the csv file as pandas dataframe and then play around. Please have a look at the pandas documentation on how to read csv files. You can extract the column you want based on their header names. Also you can apply mathematical operations in a fast way. Though, for large scale calculations, note that python is not suitable since each time you import your libraries.

  • I have multiple columns whose value is zero. I need to remove all the columns having zero as its value. I didn't find it at pandas documentation. – nibar Apr 11 '20 at 12:04
0

For example, if you have a dataframe df with columns col1, col2, col3 and col4 and you want only col1 and col2, you could do -

new_df = df[['col1', 'col2']]
Sajan
  • 1,247
  • 1
  • 5
  • 13