Doing Udacity ML course. After df_final.join(df_temp, how="left")
get NaN, but in the course venv everything works great. Where might be the problem?
P.S.: I also tried df_temp.index = pd.to_datetime(df_temp.index, utc=True)
for each, seems no effect.
Here we load data.
import yfinance as yf
tickets = ["AAPL", "AMD", "GOOG", "GLD"]
def download_tickets(tickets):
for ticket in tickets:
df = yf.Ticker(ticket)
df = df.history(period="max")
df.to_csv(symbol_to_path(ticket))
Here we create path to csv from symbol.
def symbol_to_path(symbol, base_dir="data"):
if not os.path.exists(base_dir):
os.mkdir(base_dir)
return os.path.join(base_dir, "{}.csv".format(str(symbol)))
Here we join data.
# Create empty df with specified dates.
start_date = "2022-01-01"
end_date = "2023-01-01"
dates = pd.date_range(start_date, end_date)
df_final = pd.DataFrame(index=dates)
df_final.index = pd.to_datetime(df_final.index, utc=True)
# Combine all with df_final
for ticket in tickets:
file_path = symbol_to_path(symbol)
df_temp = pd.read_csv(file_path, parse_dates=True, index_col="Date",
usecols=["Date", "Close"], na_values=["nan"])
df_temp = df_temp.rename(columns={"Close": symbol})
df_final = df_final.join(df_temp, how="left")
print(df_temp.head())
print(df_final.head())
return df_final
Output:
As you see, float converts to NaN for left join
For right join we get data, but not for the range 2022-01-01/2023-01-01
Thank you.
UPD: Data after 2021