-1

enter image description here[enter image description here][2]I am having trouble interpolating my missing values. I am using the following code to interpolate

df=pd.read_csv(filename, delimiter=',')
#Interpolating the nan values
df.set_index(df['Date'],inplace=True)
df2=df.interpolate(method='time')

Water=(df2['Water'])
Oil=(df2['Oil'])
Gas=(df2['Gas'])

Whenever I run my code I get the following message: "time-weighted interpolation only works on Series or DataFrames with a DatetimeIndex"

My Data consist of several columns with a header. The first column is named Date and all the rows look similar to this 12/31/2009. I am new to python and time series in general. Any tips will help.

Sample of CSV file

Brucee
  • 31
  • 6
  • 1
    What is the output of `df.dtypes`? Your "Date" column is likely strings that *look* like dates. – ddejohn Feb 22 '22 at 18:11
  • The issue is that it has not converted your date/time column to a date/time type. It's still strings. – Tim Roberts Feb 22 '22 at 18:11
  • 1
    There is a parameter called `parse_dates` for the `pandas.read_csv()` function which you can use to automatically convert datetime-like columns into actual `datetime` objects during the reading in of the file, as opposed to later in your script. Try reading [the documentation!](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) There is also another parameter which allows you to set the index to a specific column, so you can actually reduce your code quite a bit just by taking advantage of the built-in functionality of `pandas.read_csv()`! – ddejohn Feb 22 '22 at 18:15
  • When I look at the local variables it says that Date is a dtype= dtype('0') and, the max and min say not a numeric object. So I'm assuming you are right. It thinks it is a string. I will go ahead and read the documentation. Thanks!! – Brucee Feb 22 '22 at 18:17
  • In the future, try including a [sample dataset and your expected result](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) when asking Pandas questions. It helps others immensely in helping to answer your question. – ddejohn Feb 22 '22 at 18:24
  • Thanks! It did help a lot. It is changing the nans in between the data. Just the nan values in the first row were not changing. That was confusing me. I just need to figure out how to change the ones in the first row. Update: The first row of nans can be changed using df3 = df.bfill() – Brucee Feb 22 '22 at 19:03

1 Answers1

0

Try this, assuming the first column of your csv is the one with date strings:

df = pd.read_csv(filename, index_col=0, parse_dates=[0], infer_datetime_format=True)
df2 = df.interpolate(method='time', limit_direction='both')

It theoretically should 1) convert your first column into actual datetime objects, and 2) set the index of the dataframe to that datetime column, all in one step. You can optionally include the infer_datetime_format=True argument. If your datetime format is a standard format, it can help speed up parsing by quite a bit.

The limit_direction='both' should back fill any NaNs in the first row, but because you haven't provided a copy-paste-able sample of your data, I cannot confirm on my end.

Reading the documentation can be incredibly helpful and can usually answer questions faster than you'll get answers from Stack Overflow!

ddejohn
  • 8,775
  • 3
  • 17
  • 30
  • I tried it. It did add an index to all my other columns in df. But it did not interpolate the nan values when I proceeded to try interpolate df df2=df.interpolate(method='time') – Brucee Feb 22 '22 at 18:29
  • Can you provide a sample of your csv file (as copy-paste-able formatted text) as an edit to your original post? – ddejohn Feb 22 '22 at 18:31
  • I added a sample. It's a long csv file. For some reason when I tried your suggestion I was not able to see the nan values change. – Brucee Feb 22 '22 at 18:50
  • @JosePaz please add a **copy-paste-able** sample from your csv. The first 5 or 10 rows is fine, but it needs to be *text* that I can copy to my machine to test. – ddejohn Feb 22 '22 at 18:56
  • I think I understand what is happening. It fixed all my nan values except the ones in the first row. It did change all the ones in the middle. – Brucee Feb 22 '22 at 18:59
  • @JosePaz you should be able to use `limit_direction='both'`. See my updated answer. Please also consider marking my solution as accepted so that your question is removed from the unanswered-queue. – ddejohn Feb 22 '22 at 19:12