0

In the homework, we are asked to find the id which has multiple matchings. Just like in database, there is one to many relationships.

I could only specify that both ids do not match. Since the distinct count number differs.

import numpy as np
import pandas as pd
player_att = pd.read_csv('Player_Attributes.csv',sep = ',') 
player_att.head()

player_att.player_fifa_api_id.nunique()  
player_att.player_api_id.nunique()

For the above codes, the results are 11062, 11060. That is to say the two id numbers do not match. But how to find the one with multiple fifa_api_id?

Sandy
  • 359
  • 4
  • 14
  • can you create a sample dataframe? and expected output? refer [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – anky Mar 30 '19 at 20:23

1 Answers1

0

Try:

player_att.groupby('player_fifa_api_id').player_api_id.count()

Basically, groupby will gather all rows with the same player_fifa_api_id together, and count returns the number (count) of each group.

After this, you have a series named player_api_id and indexed by player_fifa_api_id. If you want to players with more than one player_api_id, then you look at the series where the value is larger than 1.

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74