2

I have a dataframe like this:

Id  Seq Event
1     2    A 
1     3    B 
1     5    c 
1     6    A 
2     1    A 
2     2    B 
2     4    A 
2     6    B

I want to find how many times a specific pattern appears. Let's say "AB" . The output should be.

Id  Pattern_Count
1    1
2    2 

I tried using Event + Event.shift() and searching for the specific pattern. It's a tedious task when I have to search for a longer pattern like "ABCDE" and I don't want to shift it 4 times. Is there any alternative way to do this?

No_body
  • 832
  • 6
  • 21

2 Answers2

4

You can do this with groupby, agg, and str.count:

(df.groupby('Id')['Event']
   .agg(''.join)
   .str.count('AB')
   .reset_index(name='Pattern_Count'))

   Id  Pattern_Count
0   1              1
1   2              2

Note that str.count will work for simple substring matches only, regex patterns are not supported directly.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • I've been studying your answer. It amazes me that so many things can be strung together in this dense fashion. I would not have thought of using .join as a function to .agg. It does not show up in the documentation I have read. How did you know you could use that in .agg? I tried dir(groupby.agg) to list all functions, but it did not work. – R. Wayne Jan 21 '19 at 23:29
  • @R.Wayne You can pass any function to agg, as long as it results in a single, aggregated value. In this case, I used `str.join`. See [this post on string concatenation](https://stackoverflow.com/questions/12453580/concatenate-item-in-list-to-strings). `agg(''.join)` can also be specified as `agg(lambda x: ''.join(x))` but is more verbose and less performant. – cs95 Jan 21 '19 at 23:30
2

You can use groupby to isolate your groups then concatenate your strings with sum to count the occurrences of your substring.

result = df.groupby('Id')['Event'].sum().str.count('AB')
busybear
  • 10,194
  • 1
  • 25
  • 42