1

I'm trying to plot the following dataset as a line graph with multiple lines over day and then another graph showing over month (depending on how messy the day graph is).

The data is in the following structure:

    date_played acousticness    danceability    energy  instrumentalness    liveness    speechiness
0   29/10/2019 18:57    0.083100    0.439   0.870   0.861000    0.4000  0.0375
1   30/10/2019 07:27    0.083100    0.439   0.870   0.861000    0.4000  0.0375
2   30/10/2019 07:30    0.082200    0.638   0.668   0.000000    0.0815  0.0259
3   30/10/2019 07:30    0.000031    0.469   0.932   0.000121    0.0799  0.0759
4   30/10/2019 07:36    0.082200    0.638   0.668   0.000000    0.0815  0.0259
5   30/10/2019 07:40    0.000031    0.469   0.932   0.000121    0.0799  0.0759

I've tried multiple methods from googling around/using Stack Overflow but I can't get it close and would like to see how it's done.

I was trying to group the data together to a day level and then average the values up so that I had one record per day which can be plot easily. The best I could get was a count of records per day which is far from what I want.

Thanks in advance

Edit:

data6temp = data6[['date_played','acousticness','danceability','energy','instrumentalness','liveness','speechiness']]

gets me the dataset. I tried this

print (data6temp.resample('D').mean())

and this link to other answer and a few others that didn't work so I've since removed the code from my workbook. I've also tried to do this on a table with just the date_played and the energy column but couldn't get this to provide me with a daily average either. I'm just not sure how to transform the dataset from date/time to a grouped date view and averaging the remaining values.

Also tried this (another attempt):

data6temp.groupby([data6temp.Time.dt.strftime('%D %M %Y')])['energy'].mean().reset_index(name="Daily Avg")

but it comes up saying AttributeError: 'DataFrame' object has no attribute 'Date'

  • Show us how far you got and what the current problem is. The question sounds rather like a data calculation than a plotting problem. – Mr. T Nov 15 '20 at 12:02
  • I have included this at the end of my question for you. – TerrifiedJelly Nov 15 '20 at 12:08
  • 1
    Are you sure your `date_played` column is indeed a datetime object? Insert `df["date_played"] = pd.to_datetime(df["date_played"])`, then try `means=df.resample("D", on='date_played').mean()`. – Mr. T Nov 15 '20 at 12:31
  • That did it. Thanks so much! I'd stupidly never even thought to check it was a date format after I imported it again this morning! Thanks for your help and thanks also to Akhil Nukala for your suggestion. If you resubmit as the answer, I can tick it for you. – TerrifiedJelly Nov 15 '20 at 12:38
  • 1
    Great! I bet we all have been there (and, alas, will be there again). – Mr. T Nov 15 '20 at 12:41

2 Answers2

2

I think the first thing you have to do is split the date column into year, month and day columns (individual)

df['date_played'] = pd.to_datetime(df['date_played'])
df['transaction_year'] = df['date_played'].dt.year
df['transaction_date'] = df['date_played'].dt.day
df['transaction_month']=df['date_played'].dt.month

If the above code is not working for you, check in google how to split because this was specific to my code.

After that, you can do

data_n = df.groupby(['transaction_date','acousticnes'])['acousticnes'].mean()
data_n.unstack(level = 0)

and check the tabular data.

I think it should work.

2

Use -

tmp = df.groupby(df['date_played'].dt.strftime('%Y-%m-%d')).mean()
tmp.plot(kind='line')

enter image description here

Vivek Kalyanarangan
  • 8,951
  • 1
  • 23
  • 42