1

Hello guys I have this dataset:

import pandas as pd 

# intialise data of lists. 
data = {'Year':['2017', '2018', '2018', '2019'],'Month':['1', '1', '2', '3'],'Outcome':['dead', 'alive', 'alive', 'empty'], 'outcome_count':[20, 21, 19, 18]} 

# Create DataFrame 
dfy = pd.DataFrame(data) 

# Print the output. 
print(dfy)

I do want to plot Outcome against period which should be month and year. Now, month and year are on different columns, how can I combine them so that I have a graph of the outcome against month and year. legends should have outcome name?

vestland
  • 55,229
  • 37
  • 187
  • 305
LivingstoneM
  • 1,088
  • 10
  • 28

2 Answers2

2

You can create new column filled by datetimes by to_datetime if passed 3 columns DataFrame with Year, Month, Day columns and then month periods by Series.dt.to_period:

dfy['dates'] = pd.to_datetime(dfy[['Year','Month']].assign(Day=1))
dfy['per'] = dfy['dates'].dt.to_period('m')
print(dfy)
   Year Month Outcome  outcome_count      dates      per
0  2017     1    dead             20 2017-01-01  2017-01
1  2018     1   alive             21 2018-01-01  2018-01
2  2018     2   alive             19 2018-02-01  2018-02
3  2019     3   empty             18 2019-03-01  2019-03

Then is possible plot with periods or with datetimes:

dfy.plot(x='per', y='outcome_count')
dfy.plot(x='dates', y='outcome_count')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

Your dataset is very limited. Building on the approach from jezrael I'm able to produce this:

enter image description here

If this is in fact what you're looking for, I can explain the details. If not, then I'm sure we'll find another approach.

Here's the code so far:

import pandas as pd 
import plotly.graph_objects as go
import plotly.express as px

# intialise data of lists. 
data = {'Year':['2017', '2018', '2018', '2019'],'Month':['1', '1', '2', '3'],'Outcome':['dead', 'alive', 'alive', 'empty'], 'outcome_count':[20, 21, 19, 18]} 

# Create DataFrame 
dfy = pd.DataFrame(data) 

# approach from jezrael
dfy['dates'] = pd.to_datetime(dfy[['Year','Month']].assign(Day=1))
dfy['per'] = dfy['dates'].dt.to_period('m')

# periods as string
dfy['period']=[d.strftime('%Y-%m') for d in dfy['dates']]

# unique outcomes
outcomes = dfy['Outcome'].unique()

# plotly setup
fig = go.Figure()

# one trace per outcome
for outcome in outcomes:
    df_plot = dfy[dfy['Outcome']==outcome]
    fig.add_trace(go.Scatter(x=df_plot['period'], y=df_plot['outcome_count'],
                             name=outcome
                          ))

fig.show()
vestland
  • 55,229
  • 37
  • 187
  • 305
  • @LivingstoneM I'm glad to hear that! Since jezrael has rightfully earned the acceptance mark, may I be so blunt as to guide you towards the up-vote button in my case if my contribution was helpful to you in any way? – vestland Feb 05 '20 at 11:21
  • Done so sir. Has been very useful – LivingstoneM Feb 05 '20 at 11:27
  • @LivingstoneM Thanks! **Asking** for upvotes is a total taboo, but I think it was in order this one time since the solution to your challenge was a clear team-effort. – vestland Feb 05 '20 at 11:31
  • @LivingstoneM Forgive me for asking, but would you mind me inviting you to [chat](https://chat.stackoverflow.com/rooms/207269/room-for-vestland-and-livingstonem) for a brief minute? – vestland Feb 05 '20 at 11:33