How to use pd.concat to merge multiple DataFrames together in a For Loop

Question

I am using the Dark Sky API and the darkskylib library to create a yearly, hourly forecast for New York City.

nyc.hourly returns a DataBlock with all weather data, from which I can call the temperature for the next 24hrs.

Basically, my problem is that the variable holding does not seem to add the temperature from the two dates together, but just returns the last one. I think that I have got all of my indentation right, but maybe not.

import time
import pandas as pd

from darksky import forecast
NYC = 'API Key',40.7128,-74.0060

from datetime import date, timedelta, datetime

l = 2

for i in range(0,l):
    nyc = forecast(*NYC, time=date_list[i])

    nyc.refresh(units='si', extend='hourly')
    # change units to SI units

    n = len(nyc.hourly)

    temp = []
    unix_time = []
    year = []

    # create a list of hourly temperatures for the day in question
    for i in range(0,n):
        hourly_temp = nyc.hourly[i].temperature
        temp.append(hourly_temp)
    year.append(temp)
    holding = pd.DataFrame(temp)
final = pd.concat([holding], ignore_index=True)

note; I define date_list at the beginning of the code, it is a bit long, but this is what it returns, and its entries are strings.

>>> date_list
['2016-01-01T00:00:00', '2016-01-02T00:00:00']

What I don't understand is that I am doing the same thing in the nested for loop as I am doing in the larger one, and they seem to follow the same steps. It works for the nested one, but not the other one. — Luka Vlaskalic, Feb 09 '18 at 11:05

score 2 · Answer 1 · answered Feb 09 '18 at 11:18

2

Try this set-up. You need to store all the holding dataframes and combine them at the end. Dictionaries are a convenient way to do this.

holding = {}
l = 2

for i in range(0, l):
    # perform calculations
    holding[i] = pd.DataFrame(temp)

final = pd.concat(list(holding.values()), ignore_index=True)

answered Feb 09 '18 at 11:18

jpp

159,742
34
281
339

I am not so familiar with dictionaries, but this seems like a good solution. I have tried to use this, however at the moment holding gives a dictionary with one entry, rather than a dictionary with an entry for every day. – Luka Vlaskalic Feb 09 '18 at 11:34
you need a unique index for each dataframe. in this case, I've used `i` but you need to make sure each dataframe is assigned to a unique key. – jpp Feb 09 '18 at 12:16
I thought that you would, my idea was that you would assign each data frame to the date related to it from the date_list. How would I do this? Or if you could point me in the direction of the right part of documentation – Luka Vlaskalic Feb 09 '18 at 12:24
the unique identifier doesn't matter after you `concat`. if you are cycling through a list of dates, you can do this: `for idx, date in enumerate(dates)`, then use `idx` as a counter. – jpp Feb 09 '18 at 12:25
Or use a list instead of dict if unique keys are not needed. Then append dfs in loop, concat outside. – Parfait Feb 09 '18 at 13:55
@Parfait, I wouldn't use a list. appending to a growing list is expensive. adding to a dictionary is cheap. – jpp Feb 09 '18 at 13:57
Interesting! Do you have documentation on that? Technically, you are growing the dict even expanding its hash table with new keys. While fast to search, [one answer here](https://stackoverflow.com/a/513906/1422451) highlights dict/set's memory footprint. – Parfait Feb 09 '18 at 15:25

How to use pd.concat to merge multiple DataFrames together in a For Loop

1 Answers1