1

May be a simply answer so apologies in advance (minimal coding experience).

I am trying to drop any rows with particular string (Economy 7) from ANY column and have been trying to go off this thread:

How to drop rows from pandas data frame that contains a particular string in a particular column?

Couldn't get it to work but tried this code on a previous DataFrame (now df = energy) and it seemed to work although now it comes up with an error:

no_eco = energy[~energy.apply(lambda series: series.str.contains('Economy 7')).any(axis=1)]

AttributeError: ('Can only use .str accessor with string values, which use np.object_ dtype in pandas', 'occurred at index existingProductCodeGas')

Any suggestions? ps DataFrame is extremely large.

Thanks

geds133
  • 1,503
  • 5
  • 20
  • 52

2 Answers2

1

You can select only object columns, obviously strings by select_dtypes:

df = energy.select_dtypes(object)
#added regex=False for improve performance like mentioned @jpp, thank you
mask = ~df.apply(lambda series: series.str.contains('Economy 7', regex=False)).any(axis=1)
no_eco = energy[mask]

Sample:

energy = pd.DataFrame({
        'A':list('abcdef'),
         'B':[4,5,4,5,5,4],
         'C':[7,8,9,4,2,3],
         'D':[1,3,5,7,1,0],
         'E':[5,3,6,9,2,4],
         'F':list('adabbb')
})

print (energy)
   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  d
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b

df = energy.select_dtypes(object)
mask = ~df.apply(lambda series: series.str.contains('d')).any(axis=1)
no_eco = energy[mask]
print (no_eco)

   A  B  C  D  E  F
0  a  4  7  1  5  a
2  c  4  9  5  6  a
4  e  5  2  1  2  b
5  f  4  3  0  4  b
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Seems to have the same about of observations in new variable no_eco so doesn't seem to have dropped anything. Also if whole DataFrame is converted to string would integers and floats be converted too? @jezrael – geds133 Oct 30 '18 at 11:48
  • Still the same amount of observations in new variable @jezrael – geds133 Oct 30 '18 at 11:52
  • @GerardChurch - Added sample data, if solution with your real data not working, add [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve). – jezrael Oct 30 '18 at 11:58
  • code still creating variable which has same amount of observation and now no columns. Example code is in the original post? – geds133 Oct 30 '18 at 12:55
  • @GerardChurch - hmmm, my sample data remove all rows with `d`, if solution not woring with `Economy 7` it seems no match - no value `Economy 7` in data. Can you test some another string? – jezrael Oct 30 '18 at 12:59
  • yes not sure why it is not working. Just tried it with another string 'Standard' but hasn't taken any rows out either. – geds133 Oct 30 '18 at 13:32
  • @GerardChurch - Without your data hard to know. Are data confidental? – jezrael Oct 30 '18 at 13:34
  • Unfortunately yes. Thanks for trying anyway, much appreciated. – geds133 Oct 30 '18 at 13:37
  • This code seems to have worked this morning and has dropped all 'Economy 7' rows. Must have been having a moment yesterday. Many thanks. – geds133 Oct 31 '18 at 10:52
  • code seems to have had a wobbly and now gives following error: `KeyError: "['nan' 'nan' 'nan' ... 'nan' 'nan' 'nan'] not found in axis"` Do you know why this might be as I haven't changed anything. – geds133 Nov 08 '18 at 12:54
  • @geds133 - What return `mask` ? Maybe need `mask = ~df.apply(lambda series: series.str.contains('Economy 7', regex=False, na=False)).any(axis=1)` – jezrael Nov 08 '18 at 12:58
  • Yeah returns the last line. Code still doesn't seem to work, looks as if it is searching for nan as column headers? @jezrael – geds133 Nov 08 '18 at 13:08
  • @geds133 - it return last line? It has to return `True` and `False` values in Series. – jezrael Nov 08 '18 at 13:09
  • 1
    Apologies it works when I stop running code beneath so must have affected it somewhere below somehow. Sorry about the false alarm. – geds133 Nov 08 '18 at 13:22
0

We can drop rows if any of the column contains a particular string, using to_string method

df.drop(df[df.apply(lambda row: 'Tony' in row.to_string(header=False), axis=1)].index, inplace=True)

complete example is

import pandas as pd

df = pd.DataFrame(columns = ['Name', 'Location'])
df.loc[len(df)] = ['Mathew', 'Houston']
df.loc[len(df)] = ['Tony', 'New York']
df.loc[len(df)] = ['Jerom', 'Los Angeles']
df.loc[len(df)] = ['Aby', 'Dallas']
df.loc[len(df)] = ['Elma', 'Memphis']
df.loc[len(df)] = ['Zack', 'Chicago']
df.loc[len(df)] = ['Lisa', 'New Orleans']
df.loc[len(df)] = ['Nita', 'Las Vegas']

df.drop(df[df.apply(lambda row: 'Tony' in row.to_string(header=False), axis=1)].index, inplace=True)
print(df)

output:

     Name     Location
0  Mathew      Houston
2   Jerom  Los Angeles
3     Aby       Dallas
4    Elma      Memphis
5    Zack      Chicago
6    Lisa  New Orleans
7    Nita    Las Vegas
[Finished in 1.4s]
Prince Francis
  • 2,995
  • 1
  • 14
  • 22