2

I have the following dataframe: <class 'pandas.core.frame.DataFrame'>

RangeIndex: 1642 entries, 0 to 1641
Data columns (total 13 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   Date              1642 non-null   datetime64[ns]
 1   Volgnr            1642 non-null   int64         
 2   account           1642 non-null   object        
 3   Rentedatum        1642 non-null   datetime64[ns]
 4   Bedrag            1642 non-null   float64       
 5   Balance           1642 non-null   float64       
 6   tegenrekening     906 non-null    object        
 7   Code              1642 non-null   object        
 8   Naam tegenpartij  1642 non-null   object        
 9   description       1642 non-null   object        
 10  category          1642 non-null   object        
 11  Grootboek         1578 non-null   object        
 12  Kleinboek         1578 non-null   object        
dtypes: datetime64[ns](2), float64(2), int64(1), object(8)
memory usage: 166.9+ KB

'account' has 5 different account numbers which like so: NL00ABCD0123456789

I want two different graphs but I'm already stuck with the first one i.e. I want to see the balance over time for the 5 accounts

In line with other question on this forum I tried:

pd.options.plotting.backend="plotly"
df.set_index('Date', inplace=True)
df.groupby('account')['balance'].plot(legend=True)

But got the following error:

TypeError: line() got an unexpected keyword argument 'legend'

What is going wrong here?

For later: If that is solved I want the X-axis to be weeks or months instead of the absolute date so some aggregation will be necessary

vestland
  • 55,229
  • 37
  • 187
  • 305
matje59
  • 55
  • 10
  • 2
    I am unfamiliar with the plotly library, but from the error I can surmise that the plot method does not have an attribute legend. Check the docs to determine the proper attribute name for creating the legend. – itprorh66 Jan 15 '22 at 17:09
  • 3
    can you provide a few sample rows of data? struggling to understand the column names to generate something meaningful to provide an answer. additionally it's generally simpler to use the **plotly express** API rather than **pandas** wrapper layer – Rob Raymond Jan 15 '22 at 18:29

2 Answers2

3

Short answer:

You're seeing this error because running df.plot() will trigger px.line() after defining pd.options.plotting.backend="plotly". And px.line() does not have a legend atribute. But you don't need it. All you need is:

px.line(df, x = 'Date', y = 'Balance', color = 'Account')

And you'll get:

enter image description here

The details:

Setting pd.options.plotting.backend="plotly" will, as you surely may know, override the default plotting backend for pandas which is matplotlib. Still, when running help(df.plot()) after that, the help info that pops up still seems to be info about matplotlib, which does in fact have a legend attribute.

But px.line() is what will be triggered by df.plot() after instantiating pd.options.plotting.backend="plotly". And this is what triggers your error, since px.line does not have legend attribute. Dut don't worry about that, since things are about to get really simple for you, because px.line() will produce a grouped legend for you. You don't even need to group your data as long as you apply df.plot() correctly.

But before we come to that, we'll have to take a look at your provided dataset. Given the wording of your question, and the look of the 'data' what you've provided, my understanding is that you've got several non-unique accounts under account associated with different values for balance spread across multiple non-unique dates. Something like this:

          Date             Account  Balance
0   01.01.2022  NL00ABCD0123456789        1
1   01.01.2022  NL00ABCD0123456790        2
2   01.01.2022  NL00ABCD0123456791        2
3   01.01.2022  NL00ABCD0123456792        3
4   01.01.2022  NL00ABCD0123456793        4
5   02.01.2022  NL00ABCD0123456789        2
6   02.01.2022  NL00ABCD0123456790        3
7   02.01.2022  NL00ABCD0123456791        3
8   02.01.2022  NL00ABCD0123456792        4
9   02.01.2022  NL00ABCD0123456793        5

If that's the case, then all you need to do is run:

px.line(df, x = 'Date', y = 'Balance', color = 'Account')

Plot:

enter image description here

Complete code:

import pandas as pd
import plotly.express as px

pd.options.plotting.backend="plotly"
df = pd.DataFrame({'Date': {0: '01.01.2022',
              1: '01.01.2022',
              2: '01.01.2022',
              3: '01.01.2022',
              4: '01.01.2022',
              5: '02.01.2022',
              6: '02.01.2022',
              7: '02.01.2022',
              8: '02.01.2022',
              9: '02.01.2022',
              10: '03.01.2022',
              11: '03.01.2022',
              12: '03.01.2022',
              13: '03.01.2022',
              14: '03.01.2022',
              15: '04.01.2022',
              16: '04.01.2022',
              17: '04.01.2022',
              18: '04.01.2022',
              19: '04.01.2022'},
             'Account': {0: 'NL00ABCD0123456789',
              1: 'NL00ABCD0123456790',
              2: 'NL00ABCD0123456791',
              3: 'NL00ABCD0123456792',
              4: 'NL00ABCD0123456793',
              5: 'NL00ABCD0123456789',
              6: 'NL00ABCD0123456790',
              7: 'NL00ABCD0123456791',
              8: 'NL00ABCD0123456792',
              9: 'NL00ABCD0123456793',
              10: 'NL00ABCD0123456789',
              11: 'NL00ABCD0123456790',
              12: 'NL00ABCD0123456791',
              13: 'NL00ABCD0123456792',
              14: 'NL00ABCD0123456793',
              15: 'NL00ABCD0123456789',
              16: 'NL00ABCD0123456790',
              17: 'NL00ABCD0123456791',
              18: 'NL00ABCD0123456792',
              19: 'NL00ABCD0123456793'},
             'Balance': {0: 1,
              1: 2,
              2: 2,
              3: 3,
              4: 4,
              5: 2,
              6: 3,
              7: 3,
              8: 4,
              9: 5,
              10: 3,
              11: 4,
              12: 4,
              13: 5,
              14: 6,
              15: 4,
              16: 5,
              17: 5,
              18: 6,
              19: 7}})

px.line(df, x = 'Date', y = 'Balance', color = 'Account')
vestland
  • 55,229
  • 37
  • 187
  • 305
  • 1
    I was out for a while and in the mean time several answers came in. Thanks all for taking the time to look at my problem. @vestland thanks very very much. You interpreted my missing data exactly right. (I don't know how you get it so quickly so much in stack overflow.) I tried it and it worked. Top! – matje59 Jan 15 '22 at 20:48
  • @matje59 Glad it worked for you! And thanks for the feedback! There are so many helpful, knowledgeable, talented and highly active users on the `plotly` tag these days so you *have* to be quick if you want to answer questions =) – vestland Jan 15 '22 at 22:22
  • @matje59 If you'd like how to easily enhance your future questions with a data sample, I would just like to suggest that you can follow the steps [here](https://stackoverflow.com/questions/63163251/pandas-how-to-easily-share-a-sample-dataframe-using-df-to-dict/63163254#63163254) – vestland Jan 15 '22 at 22:43
  • 1
    that's most helpful thanks! – matje59 Jan 16 '22 at 11:27
0

Since you have not provided with a sample data, I have solution with an arbitrary time-series data.

{'Date': ['10/03/2004',
  '10/03/2004',
  '10/03/2004',
  '10/03/2004',
  '10/03/2004'],
 'Time': ['18.00.00', '19.00.00', '20.00.00', '21.00.00', '22.00.00'],
 'CO(GT)': ['2,6', '2', '2,2', '2,2', '1,6'],
 'PT08.S1(CO)': [1360.0, 1292.0, 1402.0, 1376.0, 1272.0],
 'NMHC(GT)': [150.0, 112.0, 88.0, 80.0, 51.0],
 'C6H6(GT)': ['11,9', '9,4', '9,0', '9,2', '6,5'],
 'PT08.S2(NMHC)': [1046.0, 955.0, 939.0, 948.0, 836.0],
 'NOx(GT)': [166.0, 103.0, 131.0, 172.0, 131.0],
 'PT08.S3(NOx)': [1056.0, 1174.0, 1140.0, 1092.0, 1205.0],
 'NO2(GT)': [113.0, 92.0, 114.0, 122.0, 116.0],
 'PT08.S4(NO2)': [1692.0, 1559.0, 1555.0, 1584.0, 1490.0],
 'PT08.S5(O3)': [1268.0, 972.0, 1074.0, 1203.0, 1110.0],
 'T': ['13,6', '13,3', '11,9', '11,0', '11,2'],
 'RH': ['48,9', '47,7', '54,0', '60,0', '59,6'],
 'AH': ['0,7578', '0,7255', '0,7502', '0,7867', '0,7888']
}

We need to convert date to a datetime object.


df['Date'] = pd.to_datetime(df['Date'] + " " + df['Time'], format="%d/%m/%Y %H.%M.%S")

# To plot with monthly aggregation you can use resample. 
df.set_index('Date').resample('1M').mean().plot()

enter image description here

Irfanuddin
  • 2,295
  • 1
  • 15
  • 29