0

I am trying to save time series of two variables that have different forecast steps. How can I modify the code below to be able to save both variables with different time steps in the same csv file. One of them starts the cycle at 000 and the other from 003 h of forecast.

But when I try to save, the following error occurs: IndexError: index 112 is out of bounds for axis 0 with size 112, sending another variable with 114 time steps.

lat   = GFS.variables['latitude'][:]
lon   = GFS.variables['longitude'][:]
times = GFS['valid_time'][:]
time_cycle = radiation['valid_time'][:]
unit  = GFS['time'].units
step  = GFS['step']

for key, value in stations.iterrows():
    #print(key,value[0], value[1], value[2])
    station = value[0]
    file_name = "{}{}".format(station,".csv")
    #print(file_name)
    lon_point = value[1]
    lat_point = value[2]
    ########################################
    
    # Encontrando o ponto de Latitude e Longitude mais próximo das estações
    
    # Squared difference of lat and lon
    sq_diff_lat = (lat - lat_point)**2
    sq_diff_lon = (lon - lon_point)**2
    
    # Identifying the index of the minimum value for lat and lon
    min_index_lat = sq_diff_lat.argmin()
    min_index_lon = sq_diff_lon.argmin()
    print("Generating time series for station {}".format(station))
    ref_date   = datetime.datetime(int(unit[14:18]),int(unit[19:21]),int(unit[22:24]),int(unit[25:27]))
    
    rad_data = list()
    pblh_data   = list()
 
    for index, time in enumerate(times):
        date_time = ref_date+datetime.timedelta(seconds=int(time))
        date_range.append(date_time)         
        step_data.append(step[index].values)
        pblh_data.append(hpbl[index, min_index_lat, min_index_lon].values)
        
         if index_rad, time_cycle in enumerate(time_cycle):
        rad_data.append(radiation[index_rad, min_index_lat, min_index_lon].values)

    #print(date_range)
    
    df = pd.DataFrame(date_range, columns = ["Date-Time"])
    df["Date-Time"] = date_range
    df = df.set_index(["Date-Time"])
    df["Forecast ({})".format('valid time')] = step_data
    df["RAD ({})".format('W m**-2')] = rad_data
    df["PBLH ({})".format('m')] = pblh_data
    
    print("The following time series is being saved as .csv files")
        
    df.to_csv(os.path.join(dir_out,file_name), sep=';',encoding="utf-8", index=True)

#df.to_parquet(os.path.join(dir_out,file_name), 
#                   engine='auto', 
#                   compression='default',
#                   write_index=True, 
#                   overwrite=True, 
#                   append=False)

print("\n! !Successfuly saved all the Time Series the output Directory!!\n{}".format(dir_out))

That is, the PBLH variable has 114 time steps, while the RAD variable has 112, but I would like to save both variables in the same csv file. How should I modify the loop time (PBLH) and time_cycle (RAD) to put in the same csv?

Michael Delgado
  • 13,789
  • 3
  • 29
  • 54
  • How many values does `GFS['valid_time'][:]` have? 114? – farshad Jan 25 '23 at 03:15
  • yes, 114. starting at 000 to 168, while radiation data start a 003 to 168. – William Jacondino Jan 25 '23 at 03:34
  • From the looks of it your code should work just fine, then. Which line do you get the error on? – farshad Jan 25 '23 at 03:46
  • i get this error: ValueError: Length of values (12768) does not match length of index (114) when saving df["RAD ({})".format('W m**-2')] = rad_data which has 112 time steps – William Jacondino Jan 25 '23 at 04:20
  • In that case the issue is with `rad_data` structure. Pandas sees it as having 12768 values not 112. Can you get the output of `len(rad_data)` just before error line? – farshad Jan 25 '23 at 04:34
  • print(len(rad_data)) = 3024 – William Jacondino Jan 25 '23 at 04:38
  • I have a smaller amount of time_steps with rad_data, however both contain the same date, the difference is that a variable that is pblh starts at 000 at valid_time and the radiation starts at 003. I just want the script to save the data from radiation from step 3 – William Jacondino Jan 25 '23 at 04:41
  • 1
    I don’t see `index_rad` defined anywhere. That said, you should really be using vectorized indexing here. See eg this question: https://stackoverflow.com/a/69337183/3888719 – Michael Delgado Jan 25 '23 at 05:19
  • 1
    Once you’ve selected out the appropriate data, you can just use `.to_dataframe` – Michael Delgado Jan 25 '23 at 05:21

0 Answers0