3

I have the following data:

data_SP = pd.read_csv(path_ServicePath, sep='\t', header=None)
data_SP.columns = ["Id#SP"]
print(data_SP)

    IdSP
0   SH11
1   SH11
2   SH12
3   SH12
4   SH13
5   SH13
6   SH13
7   SH13
... ...

and also,

data_SH = pd.read_csv(path_ServiceHierarchy,sep='\t',header=None)
data_SH.columns = ["Id#SH", "ConnectionClient"]
print(data_SH)
IdSH    ConnectionClient
0   SH11    CN01207507
1   SH12    CN01207509
2   SH13    CN01207539
3   SH14    CN01207542
4   SH15    CN01207564
... ...

In fact, I would like to calculate the number of occurrence of data_SH elements, in data_SP. In other words, how many every single element in data_SH.['Id#SH'] is repeated in data_SP As an Instance:

SH11 = 2
SH12 = 2
SH13 = 4
Khalil Mebarkia
  • 159
  • 3
  • 10

3 Answers3

3

Let's use Series.where + Series.isin to filter the values in IdSP which are also present in IDSH, then use value_counts to count the number of occurrences:

data_SP['IdSP'].where(lambda x: x.isin(data_SH['IdSH'])).value_counts()

Alternatively you can also do value_counts + reindex:

data_SP['IdSP'].value_counts().reindex(data_SH['IdSH']).dropna()

IdSH
SH11    2.0
SH12    2.0
SH13    4.0
Name: IdSP, dtype: float6
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
2

Best way to count values in a column is the following:

data_SH.value_counts()

Refer also to this question on SO

JacoSolari
  • 1,226
  • 14
  • 28
1

this gives count but doesn't give the count of elements that are present in data_sh and not present in data_SP.

    data_SP = pd.read_csv(path_ServicePath, sep='\t', header=None)
    data_SP.columns = ["Id#SP"]
    data_SP["count"] = [1]*len(df)
    data_SP=data_SP.groupby("Id#SP").count()
gilf0yle
  • 1,092
  • 3
  • 9