0

I am new to Python trying to do a time series regression model. I have 3 columns, X, Y, and the date. I imported everything below, but I am getting stuck with an error.

import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 6
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.tsa.stattools import adfuller
raw_data = pd.read_csv("IMF and BBG Fair Values.csv")
ISO_TH = raw_data[["IMF_VALUE", "BBG_FV", "IMF_DATE"]]

Filtering to get rid of NaN:

filtered_TH = ISO_TH[np.isfinite(raw_data['BBG_FV'])]

I get this error:

C:\Program Files\Anaconda3\lib\site-packages\pandas\core\generic.py:2698: >SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation

Mario
  • 1,631
  • 2
  • 21
  • 51
Ross
  • 7
  • 2
  • I actually get the error with this line of code: filtered_TH.IMF_DATE = pd.DatetimeIndex(filtered_TH.IMF_DATE) – Ross Jun 12 '18 at 17:42
  • import numpy as np from sklearn import linear_model import matplotlib.pyplot as plt import pandas as pd %matplotlib inline from matplotlib.pylab import rcParams rcParams['figure.figsize'] = 15, 6 import statsmodels.api as sm import statsmodels.formula.api as smf from statsmodels.tsa.stattools import adfuller raw_data=pd.DataFrame([[np.inf,22,333,44], [3,4,5,2],[1,2,3,4],[np.inf,0,0,0]],columns=["BBG_FV", "IMF_VALUE", "IMF_DATE", "unused"]) ISO_TH = raw_data.loc[:,["IMF_VALUE", "BBG_FV", "IMF_DATE"]] ISO_TH.IMF_VALUE=[0,0,0,0] – Ross Jun 14 '18 at 12:19

1 Answers1

1

Your problem has the exact same origin as it is written in the pandas documentation you linked. Look at the minimal example they provided there:

def do_something(df):
   foo = df[['bar', 'baz']]  # Is foo a view? A copy? Nobody knows!
   # ... many lines here ...
   foo['quux'] = value       # We don't know whether this will modify df or not!
   return foo 

The problem is that foo might either be a copy of the dataframe df or a view. If it is a view, then changes on foo will also affect the original dataframe df. If foo is a copy, then the line foo['quux'] = value will have no effect on df.

How does this now translate to your problem?

You start with creating a dataframe from a *.csv file:

raw_data = pd.read_csv("IMF and BBG Fair Values.csv")

Then you select the columns "IMF_VALUE", "BBG_FV", "IMF_DATE" from the dataframe raw_data in the following way:

ISO_TH = raw_data[["IMF_VALUE", "BBG_FV", "IMF_DATE"]]

Now, this looks very similar to the second line from the documentation:

foo = df[['bar', 'baz']]

Is your ISO_TH a view or a copy of raw_data? We don't now! So what happens if we change a column of ISO_TH? Does raw_data also change or not? We don't now and hence the warning.

Toy example:

import pandas as pd
import numpy as np
raw_data=pd.DataFrame([[np.inf,22,333,44], [3,4,5,2],[1,2,3,4],[np.inf,0,0,0]],columns=["BBG_FV", "IMF_VALUE", "IMF_DATE", "unused"])
ISO_TH = raw_data[["IMF_VALUE", "BBG_FV", "IMF_DATE"]]
# if we now change ISO_TH, we get a warning
ISO_TH.IMF_VALUE=[0,0,0,0] # SettingWithCopyWarning

The fact that you create an intermediate object filtered_TH from ISO_TH changes nothing here.

How can we solve this? Easy, we read the docs and do what is written there!

ISO_TH = raw_data.loc[:,["IMF_VALUE", "BBG_FV", "IMF_DATE"]]

And continue as before.

Additional information: What rules does Pandas use to generate a view vs a copy?

Merlin1896
  • 1,751
  • 24
  • 39
  • Thank you so much - i still get the warning by adding that line after the raw_data reading the csv. Idk why - can I just ignore it? – Ross Jun 13 '18 at 18:26
  • If you use my proposed solution then no warning should appear. If you still get warnings, please add a fully functional minimal example to your initial post so that we can see where this warning comes from. With my minimal example, no warning appears on my machine. – Merlin1896 Jun 13 '18 at 20:09
  • Thanks for clarifying - replaced my code and used the toy code above and i have no error. The issue for me now is when i Run ISO_TH i get the below data (what happened to my numbers?) IMF_VALUE BBG_FV IMF_DATE 0 0 inf 1970-01-01 00:00:00.000000333 1 0 3.000000 1970-01-01 00:00:00.000000005 2 0 1.000000 1970-01-01 00:00:00.000000003 3 0 inf 1970-01-01 00:00:00.000000000 – Ross Jun 14 '18 at 12:16
  • As you can read from the code, I generated some toy data. You would have to replace the line starting with "raw_data=" by your line and load the csv. As I don't have your data, I could not include those. – Merlin1896 Jun 14 '18 at 15:27