How to find the id which has multiple matchings from Kaggle Football data

Question

In the homework, we are asked to find the id which has multiple matchings. Just like in database, there is one to many relationships.

I could only specify that both ids do not match. Since the distinct count number differs.

import numpy as np
import pandas as pd
player_att = pd.read_csv('Player_Attributes.csv',sep = ',') 
player_att.head()

player_att.player_fifa_api_id.nunique()  
player_att.player_api_id.nunique()

For the above codes, the results are 11062, 11060. That is to say the two id numbers do not match. But how to find the one with multiple fifa_api_id?

can you create a sample dataframe? and expected output? refer [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — anky, Mar 30 '19 at 20:23

Quang Hoang · Answer 1 · 2019-03-30T20:57:22.453

0

Try:

player_att.groupby('player_fifa_api_id').player_api_id.count()

Basically, groupby will gather all rows with the same player_fifa_api_id together, and count returns the number (count) of each group.

After this, you have a series named player_api_id and indexed by player_fifa_api_id. If you want to players with more than one player_api_id, then you look at the series where the value is larger than 1.

edited Mar 30 '19 at 20:57

answered Mar 30 '19 at 20:43

Quang Hoang

146,074
10
56
74

1

Can you explain a little bit? I don't understand. – Sandy Mar 30 '19 at 20:50
But why the two don't match? – Sandy Mar 30 '19 at 20:50
OK. Got you! If you have time, could you look at other question from my homework? https://stackoverflow.com/questions/55432851/how-to-organize-json-data-got-from-bls-gov – Sandy Mar 30 '19 at 21:11

How to find the id which has multiple matchings from Kaggle Football data

1 Answers1