6

total noob here, sorry for the beginner question. I've been racking my brain in Pandas trying to filter a series in a Dataframe to locate rows that contain one among a list of strings.

import pandas as pd
streets = ['CONGRESS', 'GUADALUPE', 'BEN WHITE', 'LAMAR', 'MANCHACA', 'BURNET', 'ANDERSON', 'BRAKER' ]
# the actual list of street names is much longer than this

strs = pd.read_csv('short_term_rental_locations.csv')

# the following returns no values, or all 'False' values to be more accurate
strs[strs['PROP_ADDRESS'].isin(streets)]

# but if I use .contains, i can find rows that contain part of the 
# street names, but .contains has a limit of six positional arguments.
strs[strs['PROP_ADDRESS'].str.contains('CONGRESS')]

I've tried using wildcard * with .isin to no avail. I feel so dumb for struggling with this. Any help much appreciated. Thanks!

jpp
  • 159,742
  • 34
  • 281
  • 339
24hourbreakfast
  • 185
  • 1
  • 2
  • 11

1 Answers1

7

.contains has a limit of six positional arguments.

There's some misunderstanding here. It's not clear what "six positional arguments" refers to. Strictly speaking, pd.Series.str.contains has a maximum of 5 arguments. But only one actually includes the strings you are searching for.

In this case, you can use regular expression, which by default is enabled, to build a single string to use with pd.Series.str.contains:

streets = ['CONGRESS', 'GUADALUPE', 'BEN WHITE', 'LAMAR',
           'MANCHACA', 'BURNET', 'ANDERSON', 'BRAKER' ]

searchstr = '|'.join(streets)
strs[strs['PROP_ADDRESS'].str.contains(searchstr)]
jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    That was fast! Thank you so much. Now I see how to translate a list of strings into a single string to use with pd.Series.str.contains. Much appreciated! – 24hourbreakfast Oct 13 '18 at 14:57