0

I have a function that gives me the time of sunset and sunrise based on an API that I called get_sun(date) with date being a string of format "%d/%m/%Y".

I have a dataframe with a column Date containing strings of format "%d/%m/%Y".

        Date        Time    Sky temp (C°)   Ambient temp (C°)
0       01/01/2020  00:00:07    -13.01  8.23
1       01/01/2020  00:01:12    -12.93  8.25
2       01/01/2020  00:02:17    -12.91  8.19
3       01/01/2020  00:03:22    -12.75  8.19
4       01/01/2020  00:04:27    -12.99  8.17
... ... ... ... ...
349074  31/10/2020  23:54:44    8.83    8.53
349075  31/10/2020  23:55:49    8.75    8.49
349076  31/10/2020  23:56:54    8.65    8.47
349077  31/10/2020  23:57:59    8.65    8.45
349078  31/10/2020  23:59:04    8.61    8.43

I want to add to my dataframe a column 'Sunrise' and 'Sunset' but without using apply. If I use dataframe.Date.apply() it will iterate on every line. For a same date I have 3000 lines so it would be much quicker to call get_sun only once per different date.

I which an output of the form :

        Date        Time    Sky temp (C°)   Ambient temp (C°) Sunrise Sunset
0       01/01/2020  00:00:07    -13.01      8.23             7:58:32    18:21:39
1       01/01/2020  00:01:12    -12.93      8.25             7:58:32    18:21:39
2       01/01/2020  00:02:17    -12.91      8.19             7:58:32    18:21:39
3       01/01/2020  00:03:22    -12.75      8.19             7:58:32    18:21:39
4       01/01/2020  00:04:27    -12.99      8.17             7:58:32    18:21:39

My code is the following :

df['Sunrise'] = ""
df['Sunset'] = ""

for i in tqdm(unique(df.Date.values)):
    (sunrise, sunset) = get_sun(i)
    df[df.Date.apply(lambda x : x==i)]['Sunrise'].apply(lambda x : sunrise)
    df[df.Date.apply(lambda x : x==i)]['Sunset']=sunset

df[df.Date.apply(lambda x : x==i)] is my way to select only the lines of my dataframe where the date is equal to i. For these lines I would like to append the value of sunrise and sunset in the corresponding columns.

Jason Aller
  • 3,541
  • 28
  • 38
  • 38
EKLZ
  • 33
  • 5
  • Please do specify a minimal working example and the expected output. On a side note though: I get the feeling, that you misunderstood the principles `pandas.DataFrame`. Instead of python for-loops you should only use functions inherent to pandas. They are implemented in such a fashion, that they automatically affect the entire dataframe, ie the columns or the rows. You can check out the [10 min pandas introduction](https://pandas.pydata.org/docs/user_guide/10min.html) for a quick intro. In your case grouping the dataframe `df.groupby('Date')` might replace the for loop. – Marc Jan 26 '21 at 16:29
  • Thank you for your answer, I thought about using groupby but I don't know how to minimize the number of call to my API while filling two columns of my df with this method. – EKLZ Jan 26 '21 at 16:51
  • 1
    Does this answer your question? https://stackoverflow.com/questions/60855163/how-to-change-column-values-or-create-values-in-a-new-column-based-on-values-in – Arsik36 Jan 26 '21 at 16:53

2 Answers2

2

I think you over complicate the definition of your new columns. A single call to pandas.apply should suffice for your needs. No need to iterate by hand, nor to find unique dates.

Here is a simplified example (dates/sunrise/sunset as integers):

#your function
get_sunrise = lambda date: (date-1,date+1)

#function passed to pandas.DataFrame.apply(...,axis=1)
def fun(row):   
    (sunrise,sunset) = get_sunrise(row['date'])
    row['sunrise'] = sunrise
    row['sunset'] = sunset
    return row

#mock example
df = pd.DataFrame({'date':[1,2,3,4,5,6]})
df = df.apply(fun,axis=1)

enter image description here

Marc
  • 712
  • 4
  • 7
  • Thank you for your answer, it's very clean and appropriate but as I said I wish to not use this method because it will provoke a call of get_sunrise for every row... I have a row for each minute as shown in my example, so approx 1500 rows for each day. My goal is to call it once per day and fill all corresponding rows, I would call my API 300 times instead of 5K times. I don't know if I'm clear in my request sorry – EKLZ Jan 27 '21 at 10:39
0

I found an answer that is maybe not the cleanest :

def fun(sub_df):
    Date = df.Date.iloc[0]
    (sunrise, sunset) = get_sun(Date)
    sub_df['Sunrise'] = sunrise
    sub_df['Sunset'] = sunset
    return sub_df

df = df.groupby('Date').apply(fun)

It's based on @Marc's answer but instead of applying my function for each rows it's being apply for each sub dataframe separated by date. I get the date by taking the first value of the date column : df.Date.iloc[0]

EKLZ
  • 33
  • 5