1

im new to python and trying to understand pandas. I have a dataframe-

df
         CID  ...                                      hash
0        1902659  ...  e96911b9693ec128b314c629bb504808f182bec7
1        1902663  ...  245befc377d09fe0fd5544e00d0bfe68851e8cba
2        1902674  ...  80738d5fcc402938e90db2366ebe5d4c6dd3933d
3        1902674  ...  417178ac43aea45613413ac13e0490e0aaea5a92
4        1902674  ...  417178ac43aea45613413ac13e0490e0aaea5a92
         ...  ...                                       ...
1004625  2565017  ...  c3e063e7eff025de2b63c5638f7df279f14eed56
1004626  2565017  ...  536dde3209eac448f64787a4126bf599ecab3979
1004627  2565068  ...  83475a4e61e2bfec60684eb55dfa5cced918bc97
1004628  2565250  ...  f58347052683c5c4e58f2e1fbb7e1c49f98d556b
1004629  2565373  ...  3fdbac7e12abfcacbd48ff11644236169dc67a15

[1004630 rows x 4 columns]

I want a column to reflect the number of times the combination of CID and hash values are exactly the same in the whole dataframe. For example in CID 1902674 we see that the hash key is same in both cases in both instances, so since it a match it should show as count 1 and if the same exact combo shows later than count keeps increasing. How can I go about it?

2 Answers2

2

You can use the method .pivot_table in pandas for do this.

import pandas as pd

#df_initial: The initial dataframe
#df_solution: The final dataframe

df_solution = df_initial.pivot_table(index=['Cid','Hash'], aggfunc='size')
print (df_solution )
asantz96
  • 611
  • 5
  • 15
0

You can use groupby().

fields = ['CID', 'hash']
df.groupby(fields).count()

                                                  idx
CID     hash                                         
1902659 e96911b9693ec128b314c629bb504808f182bec7    1
1902663 245befc377d09fe0fd5544e00d0bfe68851e8cba    1
1902674 417178ac43aea45613413ac13e0490e0aaea5a92    2 <--
        80738d5fcc402938e90db2366ebe5d4c6dd3933d    1
2565017 536dde3209eac448f64787a4126bf599ecab3979    1
        c3e063e7eff025de2b63c5638f7df279f14eed56    1
2565068 83475a4e61e2bfec60684eb55dfa5cced918bc97    1
2565250 f58347052683c5c4e58f2e1fbb7e1c49f98d556b    1
2565373 3fdbac7e12abfcacbd48ff11644236169dc67a15    1
jsmart
  • 2,921
  • 1
  • 6
  • 13