0

I need a new column C where each value is the frequency with which the values in two other columns A and B appear together in the data.

    A   B   C
0   7   9   2
1   7   2   2
2   1   9   3
3   4   8   1
4   9   1   1
5   6   4   1
6   7   2   2
7   7   9   2
8   1   9   3
9   1   9   3

I tried making a dictionary out of a value count like this:

import pandas as pd
import numpy as np

df = pd.DataFrame({
   'A': np.random.randint(1, 10, 100),
   'B': np.random.randint(1, 10, 100)
})

mapper = df.value_counts().to_dict()

Then I convert each row to a tuple and feed it back through the dictionary in pandas' apply function:

df['C'] = df.apply(lambda x: mapper[tuple(x)], axis=1)

This solution seems possibly (a) incorrect or (b) inefficient, and I'm wondering if there's a better way of going about it.

semblable
  • 773
  • 1
  • 8
  • 26
  • 2
    `df['C'] = df.groupby(['A','B'])['B'].transform('size')` – Quang Hoang Sep 26 '22 at 19:33
  • This worked! Is it completely arbitrary which column is passed from the groupby to transform? (You gave B, could I just as easily give A? Why is this?) – semblable Sep 26 '22 at 19:53
  • any column would work, because they do have the same size. I tend to specify one column with `transform` as it *transform* one series/column to another via groupby. – Quang Hoang Sep 26 '22 at 20:08
  • You tend to? Is it not necessary to specify some column? I tried not specifying a column and got either and error or an empty DataFrame. – semblable Sep 27 '22 at 14:27

1 Answers1

2
df['C2'] = df.groupby(['A','B'])['B'].transform('count')
df
    A   B   C2
0   7   9   2
1   7   2   2
2   1   9   3
3   4   8   1
4   9   1   1
5   6   4   1
6   7   2   2
7   7   9   2
8   1   9   3
9   1   9   3

data used for the solution

data={'A': {0: 7, 1: 7, 2: 1, 3: 4, 4: 9, 5: 6, 6: 7, 7: 7, 8: 1, 9: 1},
 'B': {0: 9, 1: 2, 2: 9, 3: 8, 4: 1, 5: 4, 6: 2, 7: 9, 8: 9, 9: 9}}
df=pd.DataFrame(data)
df

Naveed
  • 11,495
  • 2
  • 14
  • 21
  • What's the difference between using `size` and `count` here? Which is better? – semblable Sep 26 '22 at 20:02
  • 1
    this should help https://stackoverflow.com/questions/33346591/what-is-the-difference-between-size-and-count-in-pandas – Naveed Sep 26 '22 at 20:05