0

I am new to Python playing around with a csv file. I would like to find a way to print out my graph by selecting a specific date range, for example 2013-03-20:2014-03-04.

Code below:

import pandas as pd
import matplotlib.pyplot as plt

prc=pd.read_csv("csv",parse_dates=True, nrows=150, usecols=["Close"])

prc_ma=prc.rolling(5).mean()


plt.plot(prc, color="blue", label="Price")
plt.plot(prc_ma, color="red", label="Moving Average")
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Moving Average")
plt.grid()

I currently work with the parameter nrows.

Thank you

Hun3u5ek
  • 1
  • 1
  • 3
  • Possible duplicate of [Select dataframe rows between two dates](https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates) – Anton vBR Apr 05 '18 at 18:31

1 Answers1

1

Simply filter for the dates with .loc assuming datetimes are the index of dataframe:

prc = pd.read_csv("csv", parse_dates=True, nrows=150, usecols=["Close"])

prc_sub = prc.loc['2013-03-20':'2014-03-04']

To demonstrate with random data subsetted out of all days of 2013 and 2014:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option('display.width', 1000)
np.random.seed(1001)

prc = pd.DataFrame({'PRICE': abs(np.random.randn(730))}, 
                    index=pd.date_range("2013-01-01", "2014-12-31", freq="D"))

# SUBSETTED DATAFRAME
prc_sub = prc.loc['2013-03-20':'2014-03-04']

prc_ma = prc_sub.rolling(5).mean()

plt.plot(prc_sub, color="blue", label="Price")
plt.plot(prc_ma, color="red", label="Moving Average")
plt.xlabel("Date")
plt.ylabel("Price")
plt.title("Moving Average")
plt.grid()

Plot Output

Parfait
  • 104,375
  • 17
  • 94
  • 125
  • 1
    Perfect use of loc! I would probably have created a mask with `m = prc.index.to_series().between('2013-03-20','2014-03-04')`. But this is more readable! – Anton vBR Apr 05 '18 at 18:28
  • @AntonvBR ... Indeed, if dates is not an index, boolean indexing with a mask would be needed. Let's await OP's confirmation. – Parfait Apr 05 '18 at 18:31
  • I can't seem to get it working with the .loc attribute. Can there be a problem in the raw data? And do I understand correctly, that the code before # SUBSETTED DATAFRAME is used as an example? – Hun3u5ek Apr 05 '18 at 18:58
  • Please post a few rows of *prc*. To use `.loc` as mentioned in answer, the index of *prc* must be datetime. Does structure resemble the random number example? – Parfait Apr 05 '18 at 19:41
  • The below part of this answer is a demo of `.loc`. For you as the OP, ignore this section and use your own data. Future readers may find it useful. – Parfait Apr 05 '18 at 19:42