0

Let say I have a list of 6 integers named ‘base’ and a dataframe of 100,000 rows with 6 columns of integers as well.

I need to create an additional column which show frequency of occurences of the list ‘base’ against each row in the dataframe data.

The sequence of integers both in the list ‘base’ and dataframe are to be ignored in this case.

The occurrence frequency can have a value ranging from 0 to 6.
0 means all 6 integers in list ‘base’ does not match any of 6 columns from a row in the dataframe.

Can anyone shed some light on this please ?

Leb
  • 15,483
  • 10
  • 56
  • 75
  • Show us your dataframe and your desired results: http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Paul H Nov 05 '15 at 01:29
  • Pivot your dataframe. Use `isin()` in `apply()`. – Kartik Nov 05 '15 at 01:51

1 Answers1

0

you can try this:

import pandas as pd

# create frame with six columns of ints
df = pd.DataFrame({'a':[1,2,3,4,10],
                   'b':[8,5,3,2,11],
                   'c':[3,7,1,8,8],
                   'd':[3,7,1,8,8],
                   'e':[3,1,1,8,8],
                   'f':[7,7,1,8,8]})

# list of ints
base =[1,2,3,4,5,6]

# define function to count membership of list
def base_count(y):
    return sum(True for x in y if x in base)

# apply the function row wise using the axis =1 parameter
df.apply(base_count, axis=1)

outputs:

0    4
1    3
2    6
3    2
4    0
dtype: int64

then assign it to a new column:

df['g'] = df.apply(base_count, axis=1)
JAB
  • 12,401
  • 6
  • 45
  • 50