0

I have a dataframe all_articles with the columns ['title', 'astract']. I have written a function to do a boolean search on one column:

def searchString(string):
    if ((('apple' or 'banana') and ('fork' or 'knife') and ('red' or 'green'))) in string:
        return True
    return False

Now, I want to create a column that contains true if the combination of the title or abstract fulfills the requirement of my boolean search. As an example, suppose that my dataframe looks as follows:

title                 abstract       
'apple'               'red fork'          
'orange'              'apple red fork'          
'knife banana red'    'green bowl'         

Then, I wish my function returns: ['True','True','False'].

My current apply command looks as follows:

all_articles['boolean'] = all_articles['Abstract'].astype(str).apply(searchString)

Obviously, I could create a column in which I merge title and abstract and apply the function on that column. However, I am curious if there are ways to do this through an apply function that has 2 columns of input.

Emil
  • 1,531
  • 3
  • 22
  • 47
  • You should really test that `searchString` function; it won't do what you want. But if you want to apply something to several columns, `DataFrame.apply` already can operate on an entire row of a `DataFrame`. You'd just need to modify your function to take in a row instead of a single string. – Arya McCarthy Feb 21 '20 at 13:52
  • It would be much better let alone performance-wise to use `np.where()` or a vectorized solution. – Celius Stingher Feb 21 '20 at 14:09
  • @AryaMcCarthy, the function does work well on one column containing multiple strings. However, I would like to apply the function in a way that it looks whether my criteria are met on the merged string from `title` and `abstract` without having to actually merge them. – Emil Feb 21 '20 at 14:35
  • @CeliusStingher, how could I use the np.where() function in this context? I looked at the documentation but cannot think of a way to successfully implement it. – Emil Feb 21 '20 at 14:36
  • You are aware that `(('apple' or 'banana') and ('fork' or 'knife') and ('red' or 'green'))` is just `red`, aren't you. – Serge Ballesta Feb 21 '20 at 15:40
  • No I am not, why is that the case? – Emil Feb 21 '20 at 15:44
  • I just asked another question specifically focused on my function. Can you answer there? https://stackoverflow.com/questions/60342027/boolean-search-function-returns-false-instead-of-true – Emil Feb 21 '20 at 15:48

1 Answers1

0

Given your clause, the result should be [False, True, True] and this is how you get it:

df.stack().str.contains('(?=.*(apple|banana))(?=.*(fork|knife))(?=.*(red|green))').unstack().any(1)
zipa
  • 27,316
  • 6
  • 40
  • 58