0

I have a data set of house prices - House Price Data. When I use a subset of the data in a Numpy array, I can plot it in this nice timeseries chart:

Desired chart BUT using Numpy Array

However, when I use the same data in a Panda Series, the chart goes all lumpy like this:

The lumpy chart using a Pandas Series

How can I create a smooth time series line graph (like the first image) using a Panda Series?

Here is what I am doing to get the nice looking time series chart (using Numpy array)(after importing numpy as np, pandas as pd and matplotlib.pyplot as plt):

data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True) #pull in csv file, make index the date column and parse the dates
brixton = data[data['RegionName'] == 'Lambeth'] # pull out a subset for the region Lambeth
prices = brixton['AveragePrice'].values # create a numpy array of the average price values
plt.plot(prices) #plot
plt.show() #show

Here is what I am doing to get the lumpy one using a Panda series:

data = pd.read_csv('HPI.csv', index_col='Date', parse_dates=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice'] 
plt.plot(prices_panda)
plt.show()

How do I make this second graph show as a nice smooth proper time series?

* This is my first StackOverflow question so please shout if I have left anything out or not been clear *

Any help greatly appreciated

Paul H
  • 65,268
  • 20
  • 159
  • 136
blackhaj
  • 313
  • 2
  • 10
  • Shouting here ;-) You may want to read [mcve] and [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). You wouln't usually ask people here to download a 50Mb file to reproduce your issue. Instead, generate the data inside your code. – ImportanceOfBeingErnest Nov 18 '17 at 10:30

2 Answers2

2

When you did parse_dates=True, pandas read the dates in its default method, which is month-day-year. Your data is formatted according to the British convention, which is day-month-year. As a result, instead of having a data point for the first of every month, your plot is showing data points for the first 12 days of January, and a flat line for the rest of each year. You need to reformat the dates, such as

data.index = pd.to_datetime({'year':data.index.year,'month':data.index.day,'day':data.index.month})

Acccumulation
  • 3,491
  • 1
  • 8
  • 12
0

The date format in the file you have is Day/Month/Year. In order for pandas to interprete this format correctly you can use the option dayfirst=True inside the read_csv call.

import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv('data/UK-HPI-full-file-2017-08.csv', 
                   index_col='Date', parse_dates=True, dayfirst=True)
brixton = data[data['RegionName'] == 'Lambeth']
prices_panda = brixton['AveragePrice'] 
plt.plot(prices_panda)
plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712