1

I have a data frame

       0        2021-03-19 14:59:49+00:00  ...  I only need uxy to hit 20 eod to make up for a...
       1        2021-03-19 14:59:51+00:00  ...                                 Oh this isn’t good
       2        2021-03-19 14:59:51+00:00  ...  lads why is my account covered in more red ink...
       3        2021-03-19 14:59:51+00:00  ...  I'm tempted to drop my last 800 into some stup...
       4        2021-03-19 14:59:52+00:00  ...  The sell offs will continue until moral improves

And i have a list

names = ['SRNE', 'CRSR', 'GME', 'AMC', 'TSLA', 'MVIS', 'SPCE']

I want to check each row for this words if they exist I want to output words that were found in each row Here is what I tried

pat = '|'.join(r"\b{}\b".format(x) for x in names)
df = bearish.set_index('dt')['text'].str.extractall('(' + pat + ')')[0].reset_index(name='tickers')
df1 = pd.crosstab(df['dt'], df['tickers'])

but it gives me an empty df dataframe Thank you

YanRemes
  • 347
  • 2
  • 10
  • Welcome to stackoverflow, please read [tour] and [mre] and in this case also: [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) (1) – Andreas Aug 03 '21 at 08:56

1 Answers1

1

You can use it like this:

Sample input

import pandas as pd
d = {'index': {0: 1, 1: 2, 2: 3}, 'txt': {0: 'random text with A', 1: 'random text with B and C', 2: 'random text number A with D and E'}}
df = pd.DataFrame(d)

Code:

lst = ['A', 'B', 'C', 'D', 'E']
pat = '|'.join(r"\b{}\b".format(x) for x in lst)
df['found'] = df['txt'].str.findall(pat)

Output:

0          [A]
1       [B, C]
2    [A, D, E]
Andreas
  • 8,694
  • 3
  • 14
  • 38