6

I'm aware that dropping a dataframe's columns should be as easy as:

df.drop(df.columns[1], axis=1) to drop by index

or dr.dropna(axis=1, how='any') to drop based on if it contains NaNs.

But neither of those works on my dataframe and I'm not sure if that's because of a format issue or data type issue or a misuse or misunderstanding of these commands.

Here is my dataframe:

fish_frame after append new_column:                         0       1       2      3                          4  \
2                 GBE COD     NaN     NaN    600                        NaN   
3                 GBW COD     NaN  11,189    NaN                        NaN   
4                 GOM COD     NaN       0    NaN  Package Deal - $40,753.69   
5                 POLLOCK     NaN     NaN  1,103                        NaN   
6                   WHAKE     NaN     NaN     12                        NaN   
7             GBE HADDOCK     NaN  10,730    NaN                        NaN   
8             GBW HADDOCK     NaN  64,147    NaN                        NaN   
9             GOM HADDOCK     NaN       0    NaN                        NaN   
10                REDFISH     NaN     NaN      0                        NaN   
11         WITCH FLOUNDER     NaN     370    NaN                        NaN   
12                 PLAICE     NaN     NaN    622                        NaN   
13     GB WINTER FLOUNDER  54,315     NaN    NaN                        NaN   
14    GOM WINTER FLOUNDER     653     NaN    NaN                        NaN   
15  SNEMA WINTER FLOUNDER  14,601     NaN    NaN                        NaN   
16          GB YELLOWTAIL     NaN   1,663    NaN                        NaN   
17       SNEMA YELLOWTAIL     NaN   1,370    NaN                        NaN   
18       CCGOM YELLOWTAIL   1,812     NaN    NaN                        NaN   

       6        package_deal_column Package_Price new_column  
2    NaN  Package Deal - $40,753.69          None        600  
3    NaN  Package Deal - $40,753.69          None    11,1890  
4   None  Package Deal - $40,753.69          None          0  
5    NaN  Package Deal - $40,753.69          None      1,103  
6    NaN  Package Deal - $40,753.69          None         12  
7    NaN  Package Deal - $40,753.69          None    10,7300  
8    NaN  Package Deal - $40,753.69          None    64,1470  
9    NaN  Package Deal - $40,753.69          None          0  
10   NaN  Package Deal - $40,753.69          None          0  
11   NaN  Package Deal - $40,753.69          None       3700  
12   NaN  Package Deal - $40,753.69          None        622  
13  None  Package Deal - $40,753.69          None   54,31500  
14  None  Package Deal - $40,753.69          None      65300  
15  None  Package Deal - $40,753.69          None   14,60100  
16   NaN  Package Deal - $40,753.69          None     1,6630  
17   NaN  Package Deal - $40,753.69          None     1,3700  
18  None  Package Deal - $40,753.69          None    1,81200 

And then I have the following lines of code:

fish_frame.drop(fish_frame.columns[1], axis=1)
fish_frame.drop(fish_frame.columns[2], axis=1)
fish_frame.drop(fish_frame.columns[3], axis=1)
fish_frame.drop(fish_frame.columns[4:5], axis=1)
#del fish_frame[4:5]    #doesn't work, "TypeError: slice(4, 5, None) is an invalid key"
del fish_frame['Package_Price']
fish_frame.dropna(axis=1, how='any')

And then I printout the dataframe again and it comes out as:

NEW fish_frame:                         0       1       2      3                          4  \
2                 GBE COD     NaN     NaN    600                        NaN   
3                 GBW COD     NaN  11,189    NaN                        NaN   
4                 GOM COD     NaN       0    NaN  Package Deal - $40,753.69   
5                 POLLOCK     NaN     NaN  1,103                        NaN   
6                   WHAKE     NaN     NaN     12                        NaN   
7             GBE HADDOCK     NaN  10,730    NaN                        NaN   
8             GBW HADDOCK     NaN  64,147    NaN                        NaN   
9             GOM HADDOCK     NaN       0    NaN                        NaN   
10                REDFISH     NaN     NaN      0                        NaN   
11         WITCH FLOUNDER     NaN     370    NaN                        NaN   
12                 PLAICE     NaN     NaN    622                        NaN   
13     GB WINTER FLOUNDER  54,315     NaN    NaN                        NaN   
14    GOM WINTER FLOUNDER     653     NaN    NaN                        NaN   
15  SNEMA WINTER FLOUNDER  14,601     NaN    NaN                        NaN   
16          GB YELLOWTAIL     NaN   1,663    NaN                        NaN   
17       SNEMA YELLOWTAIL     NaN   1,370    NaN                        NaN   
18       CCGOM YELLOWTAIL   1,812     NaN    NaN                        NaN   

       6        package_deal_column new_column  
2    NaN  Package Deal - $40,753.69        600  
3    NaN  Package Deal - $40,753.69    11,1890  
4   None  Package Deal - $40,753.69          0  
5    NaN  Package Deal - $40,753.69      1,103  
6    NaN  Package Deal - $40,753.69         12  
7    NaN  Package Deal - $40,753.69    10,7300  
8    NaN  Package Deal - $40,753.69    64,1470  
9    NaN  Package Deal - $40,753.69          0  
10   NaN  Package Deal - $40,753.69          0  
11   NaN  Package Deal - $40,753.69       3700  
12   NaN  Package Deal - $40,753.69        622  
13  None  Package Deal - $40,753.69   54,31500  
14  None  Package Deal - $40,753.69      65300  
15  None  Package Deal - $40,753.69   14,60100  
16   NaN  Package Deal - $40,753.69     1,6630  
17   NaN  Package Deal - $40,753.69     1,3700  
18  None  Package Deal - $40,753.69    1,81200  

With neither the NaN drop working nor the index drop working. Only the specific drop[column name] command works but I can't do that for every iteration of this script.

I'm very confused and I hope this isn't a very dumb mistake I'm making.

Also, I myself don't fully understand this information but printing fish_frame.info() produces:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17 entries, 2 to 18
Data columns (total 8 columns):
0                      17 non-null object
1                      4 non-null object
2                      8 non-null object
3                      5 non-null object
4                      1 non-null object
6                      0 non-null object
package_deal_column    17 non-null object
new_column             17 non-null object
dtypes: object(8)
memory usage: 586.0+ bytes

Any help solving this would be appreciated thanks.

theprowler
  • 3,138
  • 11
  • 28
  • 39
  • you need to drop in place or re-assign the result to a new df. – MrE Jul 26 '17 at 17:28
  • Does this answer your question? [Delete a column from a Pandas DataFrame](https://stackoverflow.com/questions/13411544/delete-a-column-from-a-pandas-dataframe) – Gonçalo Peres Oct 05 '22 at 11:04

3 Answers3

9

If there is no error which I don't see one from your output, you've simply forgotten to use the inplace parameter:

df.drop(df.columns[1], axis=1, inplace=True)
A.Kot
  • 7,615
  • 2
  • 22
  • 24
  • Hmm. Ok that worked, but only slightly. Idk if this is on my end but I changed my code up a bit and just did `fish_frame.drop(fish_frame.columns[1], axis=1, inplace=True)`, `fish_frame.drop(fish_frame.columns[2], axis=1, inplace=True)`, and `fish_frame.drop(fish_frame.columns[3], axis=1, inplace=True)` to delete columns 2, 3, and 4. But it deleted columns 2, 4, and 6... – theprowler Jul 26 '17 at 17:24
  • 1
    To make sure you are dropping the correct columns, use the actual column name: `fish_frame.drop('name of column 1', axis=1, inplace=True)` – A.Kot Jul 26 '17 at 17:27
  • But when my columns don't have names isn't the next best way to drop them by their index? – theprowler Jul 26 '17 at 17:28
  • How can a column not have a name? `fish_frame.columns[1]` is not passing an index, its passing the string name of the column. you can check this by doing `type(fish_frame.columns[1]` – A.Kot Jul 26 '17 at 17:30
  • Ohhhhhh. My bad I thought that doing `fish_frame.columns[1]` would select the second column of the dataframe by passing the index. How would I select by index if I wanted to? – theprowler Jul 26 '17 at 17:33
  • A column and an index are two different things. Index means the row identifier. You can't use the drop method with the column number. Which is why you should pass the exact name of the column as a parameter. – A.Kot Jul 26 '17 at 17:37
  • No I know that a column and index are separate things. I just didn't know that you couldn't use an index with `drop`. Ok I'll use the exact name of the columns, which happen to be numbers. – theprowler Jul 26 '17 at 17:43
  • If your column name is a number and not the string representation of a number then you can use the following to drop a column named `1` : `fish_frame.drop([1], axis=1, inplace=True)` – A.Kot Jul 26 '17 at 17:44
  • Yup got it. Now it's working. I need to explore why column `5` is missing but that's definitely a problem on my end. Thanks for the explanation though. I need to use `inplace=True`. Turns out it was a dumb mistake that I was unaware of. – theprowler Jul 26 '17 at 17:46
9

Here are some alternatives:

Setup:

df = pd.DataFrame(np.random.rand(3,5), columns=list('abcde'))

In [57]: cols_to_drop = ['b', 'd']

In [63]: df
Out[63]:
          a         b         c         d         e
0  0.758670  0.734007  0.027711  0.614674  0.955711
1  0.833110  0.242010  0.922831  0.165401  0.546079
2  0.414916  0.949050  0.608527  0.018036  0.230343

Option 1:

df = df[df.columns.drop(col_to_drop)]

Option 2:

df = df[df.columns.difference(cols_to_drop)]

Option 3:

df = df.loc[:, ~df.columns.isin(cols_to_drop)]

All return:

          a         c         e
0  0.758670  0.027711  0.955711
1  0.833110  0.922831  0.546079
2  0.414916  0.608527  0.230343
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
0

If you are trying to drop the columns with NaN the following code will suffice. Well, I tried it myself and it worked.

df = df.dropna(axis = 1)
Loochie
  • 2,414
  • 13
  • 20