How to use a list as search criteria in a dataframe?

Question

I am trying to get familiar with python coding and I would like to ask a little help in the following task.

I have imported two data frames from excel dfA and dfB with pandas. I would like to count the matches of each lines from dfA in dfB. To do this I converted dfSearch = dfA['Title'].tolist() to pass this as a list of values to search for.

My approach is the following:

for i in searchDF:
    result = dfB['COL1'].count(i)

Then I would like to add a new column in dfA which will store the results of each line.

    dfA['FIND_VAL1'] = result

I am sorry if this task seems trivial, but I am completely new to python and rally need some help.

Data example A:

title 
plane 
house 
car

Data example B:

title 
aero plane 
household 
luxury cars 
house decorations

Result example:

title   Results    
plane     1     
house     2    
car       1

Please provide samples of dfA and dfB along with expected output. See this article [how to ask questions](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Scott Boston, Aug 31 '17 at 14:47

cs95 · Answer 1 · 2017-08-31T16:26:38.167

1

You could call str.count in a list comprehension.

dfA['Results'] = [dfB.title.str.count(x).sum() for x in dfA.title]
dfA

   title  Results
0  plane        1
1  house        2
2    car        1

An alternative list comprehension using Pure Python sum and str.count as suggested by piR:

dfA = dfA.assign(Results=[sum([x.count(y) for x in dfB.title.values.tolist()]) 
                                       for y in dfA.title.values.tolist()])
dfA
   title  Results
0  plane        1
1  house        2
2    car        1

This one seems faster for small data, but may not scale as well.

edited Aug 31 '17 at 16:26

answered Aug 31 '17 at 15:52

cs95

379,657
97
704
746

If you're going to use comprehension, don't stop half-way `dfA.assign(Results=[sum([x.count(y) for x in dfB.title.values.tolist()]) for y in dfA.title.values.tolist()])` This is quicker. – piRSquared Aug 31 '17 at 16:17
@piRSquared Are you sure it scales for larger data? – cs95 Aug 31 '17 at 16:23
Haven't done that test yet. But your solution's time complexity is the same. Both scale quadratically. As does mine! I don't know a way around that. `O(nxm)` But the comprehension is quicker over small data than `str.count().sum()`. – piRSquared Aug 31 '17 at 16:25
@piRSquared The annoying thing is you'd have to look for substring matches too. Otherwise you could've used a `collections.Counter` and done this in linear time. – cs95 Aug 31 '17 at 16:30
@COLDSPEED I have tested your solution and I run into error: multiple repeat at position 23 Why did I got an erro? – simpleMan Sep 01 '17 at 09:25
@simpleMan I don't know. I can't tell from the one sentence you've given me. Update your pandas. – cs95 Sep 02 '17 at 05:16

score 1 · Answer 2 · answered Aug 31 '17 at 16:12

Use the count ufunc from numpy.core.defchararray with some numpy broadcasting magic.

from numpy.core.defchararray import count

b = dfB.title.values.astype(str)
a = dfA.title.values[:, None]
dfA.assign(Results=count(b, a).sum(1))

   title  Results
0  plane        1
1  house        2
2    car        1

Setup

dfA = pd.DataFrame(dict(title=['plane', 'house', 'car']))

dfB = pd.DataFrame(dict(
    title=['aero plane', 'household', 'luxury cars', 'house decorations']
))

score 0 · Answer 3 · answered Aug 31 '17 at 16:13

0

I would first try merging the dataframes:

df = pd.merge(dfA, dfB, on = "title")

answered Aug 31 '17 at 16:13

kjmerf

4,275
3
21
29

How to use a list as search criteria in a dataframe?

3 Answers3