23

I am analyzing tweets.

I have 10k tweets and am interested in a list of words occurring:

lst1=['spot','mistake']
lst1_tweets=tweets[tweets['tweet_text'].str.contains('|'.join(lst1))].reset_index()

I want to double check and have:

f=lst1_tweets['tweet_text'][0]
f='Spot the spelling mistake Welsh and Walsh. You are showing picture of presenter Bradley Walsh who is alive and kick'
type(f)
<class 'str'>

I used

f.str.contains('|'.join(lst1))

returns:

AttributeError: 'str' object has no attribute 'str'

also

f.contains('|'.join(lst1))

returns:

AttributeError: 'str' object has no attribute 'contains'

Any suggestions how I can search for a list of words in a string

Barmar
  • 741,623
  • 53
  • 500
  • 612
frank
  • 3,036
  • 7
  • 33
  • 65
  • 3
    You can only use `.str.contains()` on a Pandas series, not after extracting an individual string. – Barmar Nov 08 '19 at 21:19
  • Does this answer your question? [Does Python have a string 'contains' substring method?](https://stackoverflow.com/questions/3437059/does-python-have-a-string-contains-substring-method) – Celius Stingher Nov 08 '19 at 21:24
  • Here your `f` is referencing a Python string, whose class is named `str`: `type(f) is str`. `pandas.Series.str` is a different class with different attributes, including `contains`. You can check if a class has an attribute by a certain name (without raising an Exception, that is) with the built-in callable `hasattr` – BatWannaBe Nov 08 '19 at 21:35

3 Answers3

39

I think you are looking for in:

if 'goat' in 'goat cheese':
    print('beeeeeeh!')
gosuto
  • 5,422
  • 6
  • 36
  • 57
  • This could be a problem because list of strings he's searching for contains `'spot'` and `'mistake'`, but the string he's searching in contains `'Spot'` and `'mistake'`. Upper-case and lower-case characters are encoded differently, so the `in` operator for Python strings is case sensitive, and unlike `pandas.Series.str.contains`, you can't make the search case-insensitive. I don't know this very well, but the `|` appears to be a regex character. `pandas.Series.str.contains` might be using the same syntax as what the Python module `re` does to search strings. – BatWannaBe Nov 08 '19 at 21:48
  • 3
    `if 'goat' in 'Goat cheese'.lower():` would do the trick then. – gosuto Nov 08 '19 at 21:52
  • 1
    Looking up the pandas documentation, `pandas.Series.str.contains` does in fact use the `re` module. `.lower()` works too, but the `re` module could be more familiar. – BatWannaBe Nov 08 '19 at 21:58
  • @gosuto how can I tell when to use in and when to use str.contains ? – R.A Aug 24 '23 at 12:07
  • 1
    @R.A `str.contains` is a pandas thing, not python in general – gosuto Aug 25 '23 at 14:13
13

You might be confusing .str.contains() from pandas, which exists and is applied to series. In this case you can use in or not in operators. Here's a full guide on how to address the issue Does Python have a string 'contains' substring method?

From pandas docs:

Series.str.contains(self, pat, case=True, flags=0, na=nan, regex=True). Test if pattern or regex is contained within a string of a Series or Index.

Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
1

Not too sure if you're just checking for certain strings in a string, but i'm pretty sure .contains isn't a python thing, try this:

for "string" in f:
    # do whatever
HostingUdp
  • 52
  • 4
  • He's not trying to loop over something, he wants to test for a substring – Barmar Nov 08 '19 at 21:20
  • Also, the for-loop is assigns an object from an iterable to a variable per iteration. You can't assign objects to a string. This is likely intended to be an if-statement. – BatWannaBe Nov 08 '19 at 21:52