3

I have a dataframe called 'blah' that was created like this:

blah = pandas.read_csv(address, index_col='Date', parse_dates=True)
blah.head()
                 TransactionName  Withdrawal  Deposit    Total
Date                                                          
2016-12-01  PTS TO:  #######           10.00      NaN  2612.27
2016-12-01  ###############           250.00      NaN  2362.27
2016-12-01  SSV TO:  ###########        1.00      NaN  2361.27
2016-12-01  ###############            62.86      NaN  2298.41
2016-12-02  SSV TO:  ###########        2.00      NaN  2296.41

I want to plot Deposits against Date. Theres ~790 rows of Deposit, only 57 have values, everything else in 'NaN'.

blah['Deposit'].plot()

That command outputs this plot: Crappy plot

The problem is this plot doesn't have all the deposits on it. If I create a Series, then drop all the NaNs and plot it, everything is fine:

derp = blah['Deposit'].dropna()
derp.plot()

Here you can see all depoist activity. Notice the Deposits after 2017-12 that don't show up in the original. Good plot

Why aren't all the values plotting in the first case? If I create 'blah' without setting Date as the index column. The problem persists. Except instead of plotting against the 'Date' the graph is plotted against the row's index #.

My goal is to plot the Total, Withdrawal and Deposit columns on the same graph against the Date. Both of the other columns output fine with the command:

blah['Total'].plot() 
blah['Withdrawal'].plot()
Rann Lifshitz
  • 4,040
  • 4
  • 22
  • 42
Gus_1923
  • 33
  • 1
  • 3

1 Answers1

4

NaN will always interrupt the line plot:

Because the NaN still exist in the data the line will be interrupted. Pandas doesn't know how to carry the line through an NaN so only sequential numeric values can be plotted. You must remove the NaN to have the line continue all the way through the valid data. If you plot points you will see everything.

Here is a LINK to a similar but different question about plotting NaN where the answer mentions the problem with plotting a line through NaN.

Reproducible example:

import random
import pandas as pd
import numpy as np

c = [np.nan] * 10
c.extend(random.sample(range(100), 10))
random.shuffle(c)

d = {"a":random.sample(range(100), 20), "b":random.sample(range(100), 20), "c":c}

df = pd.DataFrame(d)

df.plot(style="-o") # both points and line to show all values

df.dropna().plot()
Hugues Fontenelle
  • 5,275
  • 2
  • 29
  • 44
Dodge
  • 3,219
  • 3
  • 19
  • 38