1

I successfully managed to scrape futbin.com for time series price data of Fifa 19 players. I have now got over 200'000 rows with player and price data. For each player I have about 17 different prices (with a respective timestamp).

I would now like to make a new dataframe with only one row per player and the price should be the average price over time for this specific player. Each player has got a unique "Futbin_ID" number. Until now I was unable to figure out how to do this... I would really appreciate it if someone could help me out...

Marcus
  • 105
  • 1
  • 2
  • 8

1 Answers1

2

You would want to group it by Fubin_ID and then find the mean of each grouping:

avg_price = df_ts.groupby('Futbin_ID')['price'].agg(np.mean)

If you want to have your dataframe with the other columns as well, you can drop the duplicates in the original except the first and replace the price value with the average:

df_ts.drop_duplicates(subset="Futbin_ID", keep="first", inplace= True)
df_ts.join[avg_price.set_index("Fubin_ID"), on="Futbin_ID"]

you can read more about groupby here: https://www.tutorialspoint.com/python_pandas/python_pandas_groupby.htm

Turtalicious
  • 430
  • 2
  • 5
  • Thank you very much! But when I do this I get a new df with only two columns (price and Futbin_ID). How can I retain all columns? – Marcus Nov 13 '18 at 19:46
  • @Marcus You need to define what method to use to aggregate values for the other columns. Since it looks like they are basically just duplicated `'first'` is what you want – ALollz Nov 13 '18 at 20:26