0

I cannot delete a column from csv using pandas. I tried to delete it in many ways using different axis, del function but it doesn't work. Does somebody know why ?

Here is my pandas.head()

age;"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y"
0  58;"management";"married";"tertiary";"no";2143...
1 44;"technician";"single";"secondary";"no";29;"...
2 33;"entrepreneur";"married";"secondary";"no";2...
3 47;"blue-collar";"married";"unknown";"no";1506...
4 33;"unknown";"single";"unknown";"no";1;"no";"n...

Here is my code:

import pandas  
df = pd.read_csv('bank-full.csv')
print(df.head())
df = df.drop(['day', 'poutcome'], axis=1)

Here is the error:

Traceback (most recent call last):
  File "/home/administrator/PycharmProjects/BankMarketinData/main.py", line 21, in 
    main()
  File "/home/administrator/PycharmProjects/BankMarketinData/main.py", line 19, in main
    df = df.drop(['day', 'poutcome'], axis=1)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py", line 3697, in drop
    errors=errors)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 3111, in drop
    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py", line 3143, in _drop_axis
    new_axis = axis.drop(labels, errors=errors)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4404, in drop
    '{} not found in axis'.format(labels[mask]))
KeyError: "['day' 'poutcome'] not found in axis"
desertnaut
  • 57,590
  • 26
  • 140
  • 166
inf
  • 21
  • 1
  • 4

3 Answers3

2

So it's a pretty simple problem. First of all, i would advise you to use delimiter whenever you're dealing with tabular data. Now let's focus on your problem, so you're reading your dataframe like this:

import pandas as pd  
df = pd.read_csv('bank-full.csv')
df = df.drop(['day', 'poutcome'], axis=1)

Now your column names contain "" in them. So the name of your columns is "day" & "poutcome" not day & poutcome. Remember these double quotes "" are part of your column name. So you should write something like this to drop these columns:

df = df.drop(['"day"', '"poutcome"'], axis=1)

I hope this helps you. If you've any further questions, let me know

astroluv
  • 798
  • 1
  • 8
  • 25
0
df = pd.read_csv('bank-full.csv', sep=';')
df.columns = [col.replace('"', '') for col in df.columns]
df.drop(columns=['day','poutcome'], inplace=True)

As you can see from the follow up comments, your issues are that you have the wrong separator when bringing in your csv file. Then, you need to remove the quotation marks that are in your column names so you can drop those columns.

db702
  • 559
  • 4
  • 12
  • you need to remove the quotation marks from your column names. either in the csv or in python, then it will work. – db702 Feb 04 '19 at 22:34
  • print out the column names and make sure they are what you expect them to be. – db702 Feb 04 '19 at 22:48
  • Index(['age;"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y"'], dtype='object') Index(['age;job;marital;education;default;balance;housing;loan;contact;day;month;duration;campaign;pdays;previous;poutcome;y'], dtype='object') and error from previus comment – inf Feb 04 '19 at 22:52
  • It is reading all of your columns as one it looks like. I will update the code above. – db702 Feb 04 '19 at 22:55
  • Thank you for this code snippet, which might provide some limited, immediate help. A [proper explanation would greatly improve its long-term value](//meta.stackexchange.com/q/114762/206345) by showing _why_ this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please [edit] your answer to add some explanation, including the assumptions you've made. – Blue Feb 05 '19 at 00:14
  • The column names contain "" . So the column names are "day" & "poutcome". Not day & poutcome. – astroluv Feb 05 '19 at 04:40
0

You can drop them one by one, or use a loop to drop multiple columns. You do need to make sure that those column names are the ones in the dataframe. It looks like from your question your column name are wrapped in "". Make sure to define your delimiter correctly when reading in the dataframe also. When using read_csv it will default to ',', but in this case it is ';'.

One by one

df = pd.read_csv('bank-full.csv', sep=';')
df = df.drop(['day'], axis=1)
df = df.drop(['poutcome'], axis=1)

Loop

df = pd.read_csv('bank-full.csv', sep=';')
Drop_list = ['day','poutcome']
for column in Drop_list: 
    df = df.drop([column], axis=1)

Test I used for question:

df = pd.DataFrame(np.random.randn(50, 4), columns=list('ABCD'))
df.head(5)

              A         B         C         D
    0  0.860680 -0.408577  0.727530 -0.119050
    1 -1.140042  0.241970 -1.509257 -0.303601
    2  0.811929  0.146228  2.102941  0.772328
    3 -0.590157  0.753719  0.220592 -0.563953
    4  0.031505 -0.521978  0.410718 -0.325865

Drop_list = ['A','B','C']
for column in Drop_list:
    df = df.drop([column], axis=1)
df.head(5)

          D
0 -0.119050
1 -0.303601
2  0.772328
3 -0.563953
4 -0.325865
Edeki Okoh
  • 1,786
  • 15
  • 27
  • 1
    his issue is that he has quotation marks in his column names – db702 Feb 04 '19 at 22:37
  • this, tested code u used work for me, but on my csv file it don't work – inf Feb 04 '19 at 22:46
  • what is your delimiter when you use pd.read_csv? When you are reading the dataframe in use df = pd.read_csv('bank-full.csv', sep=';'). It also looks like you didn't define your header row correctly. – Edeki Okoh Feb 04 '19 at 22:52
  • 1
    you are right, after added sep=';', it work, thank you for help :) – inf Feb 04 '19 at 22:58