0
import pandas as pd
from pandas.io.html import read_html
from datetime import datetime
import time

wiki_fires = pd.read_html("https://en.wikipedia.org/wiki/2017_California_wildfires")
wildfire_df = wiki_fires[1]
wildfire_df.columns = wildfire_df.iloc[0]
wildfire_df = wildfire_df[1:]

wildfire_df["Start Date"] = wildfire_df["Start Date"].apply(lambda x: dateutil.parser.parse(x))

wildfire_df["Containment Date"] = wildfire_df["Containment Date"].apply(lambda x: dateutil.parser.parse(x))
start = "2017-09-01"
end = "2017-09-30"
mask = (wildfire_df["Containment Date"] >= start) & (wildfire_df['Start Date'] <= end)
mask.head()

mask.head() returns a boolean but what i want it to do is return to me the rows in wildfire_df that fall in between start date and date including those dates. any guidance would be appreciated.

db4
  • 41
  • 1
  • 5
  • wildfire_df[mask] – BENY Mar 25 '18 at 18:49
  • @jezrael This is not a duplicate question, at least not to the one you reference. Pandas has special behavior with regard to slicing based upon dates. This is perhaps a more appropriate duplicate: https://stackoverflow.com/questions/29370057/select-dataframe-rows-between-two-dates – Joshua Cook Mar 25 '18 at 19:54
  • @db4 I would refer here: https://pandas.pydata.org/pandas-docs/stable/timeseries.html – Joshua Cook Mar 25 '18 at 19:54
  • @Wen that worked PERFECTLY i knew i was close but didnt think to call it as a variable within the dataframe. thanks so much – db4 Mar 25 '18 at 20:41

0 Answers0