0

Hi I have an excel sheet with Player names and Dates. For example:

Column A=[1000 1000 1001 1001 1001 1002 1002 1002 1002]
Column B=[03/12/2009 03/12/2009  04/01/2011 05/01/2010  08/02/2011 10/03/2012 05/12/2010 07/02/2011 09/03/2012 14/02/2013]

For each player name, I want to calculate the maximum length of time between the first and final date. I thought to perform this via a pandas df and then dictionary formation, but it does not seem to work. There must be some easier way to do this, but I can't find my way out. This is what I have tried so far:

import pandas as pd
from datetime import datetime
from itertools import count
from collections import defaultdict

Player_Dates = pd.read_excel(r'C:\Users\PycharmProjects\Project1\Data.xlsx', sheet_name='Sheet 1, header=0, na_values=['NA'], usecols = "B:C")

Player_Dates_new=Player_Dates.iloc[5:len(Player_Dates)]

Player_Dates_new.columns = ['Player_ID','Dates']

counts = {k: count(0) for k in Player_Dates_new.Player_ID.unique()}
d = defaultdict(dict)

for k, *v in Player_Dates_new.values.tolist():
    d[k][next(counts[k])] = v

dict(d)

print(d, Player_Dates_new)
amquack
  • 837
  • 10
  • 24
Leonardo
  • 119
  • 10
  • I think you're missing a closing `'` after "`'Sheet 1`" in the 6th line of your code. I understand that you're reading in an excel file, but can you copy a snippet of your dataframe so that we know what we're dealing with? – amquack Jan 22 '21 at 21:11
  • Thank you. Yes, I read an excel file. Here is an example of my dataframe: Player_ID Dates 1000 2012-01-02 00:00:00 1000 2012-02-13 00:00:00 1000 2012-02-27 00:00:00 1005 2005-02-05 00:00:00 1005 2009-03-31 00:00:00 1010 2013-03-26 00:00:00 1010 2013-06-06 00:00:00 1010 2013-07-13 00:00:00 1010 2014-05-30 00:00:00 – Leonardo Jan 23 '21 at 21:38
  • Apologies, don't know how to properly copy paste my df here. Hopefully you can understand the structure: the first column is the player name, the second column is the date in YYYY-MM-DD followed by the time which I don't need. – Leonardo Jan 23 '21 at 21:47
  • To include a dataframe, either print it (preferably `print(df.head())` if the first few lines will suffice as an example), or simply return `df.head()` (or `df`). Then copy and paste the output here. We can read it using `pd.read_clipboard()`. [This](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples/20159305#20159305) is a great reference for writing pandas questions. Obviously in your case, the "df" is named "Player_Dates", so you use that instead. – amquack Jan 25 '21 at 17:28

0 Answers0