3

Let's say that in Pandas I've a dataframe like this:

index    value

1        1
2        0
3        1
4        1
5        0
6        1

I would like to count how many times a specific sequence of values happens, like how many times a 0 occurs right after a 1 (ie. how many times [1, 0] happens, and in the example above it's twice), or how many times [1,0,1] happens (again, twice).

Is there a method to do this without using a mere for cycle?

Gian Segato
  • 2,359
  • 3
  • 29
  • 37

2 Answers2

3

generalized solution

def tuplify(s, k):
    return list(zip(*[s.values[i:].tolist() for i in range(k)]))

s = pd.Series([1, 0, 1, 1, 0, 1])

pd.value_counts(tuplify(s, 3))

(1, 0, 1)    2
(1, 1, 0)    1
(0, 1, 1)    1
dtype: int64

you can assign this to a variable and get just the tuple you want.

counts = pd.value_counts(tuplify(s, 3))
counts[(1, 0, 1)]

2

breakdown

tuplify(s, 3)

[(1, 0, 1), (0, 1, 1), (1, 1, 0), (1, 0, 1)]

tuples are hashable and can be counted thus pd.value_counts works as show above.

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • That's impressive! I was wondering what that star in front of the array does (`zip(*[[list]])`). Couldn't find any doc on that syntax, but it looks like that great part of the magic relies on that – Gian Segato Dec 20 '16 at 11:50
  • 2
    Yes! that is definitely part of the magic. `zip` expects positional arguments. My list comprehension is a single object. The `*` in front "unpacks" all the list elements as positional arguments. – piRSquared Dec 20 '16 at 15:54
1

I don't know of a way to do this without converting the pandas series to a string; I'd like to see a solution that operates directly on the series.

The following converts the series to a string and then uses the count function.

import pandas as pd
import re

s = pd.Series([1,0,1,1,0,1])

# convert to string and remove all whitespace
re.sub('\s+', '', s.to_string(index=False)).count('101')
# 2
3novak
  • 2,506
  • 1
  • 17
  • 28