I have found a separate solutions for parts of what I want to do but nothing that has worked together.
- I am searching a set of files (call them set1) and creating a pandas data frame (df) with a date (as yyyy.doy), a dump number(2 digit number), and a start time(VC5 Start) of the data inside of the file(in gps seconds). the date and dump numbers are actually designated in the file name. the dataframe also has a blank column for the data in the next step.I have no issues here with the following code.
df=pd.DataFrame(columns=['Date','Dump Number','VC5 Start', 'VC2 Start'])
for files in VC5filelist:
#print(files)
filedate_df=files[18:26] #save date of file to a variable
filedump_df=files[31:33] #save dump number of file to a variable
ds=netCDF4.Dataset(files, 'r') #read each netcdf file into a data set
VC5gpsstart=ds.variables['TIME'][0] # save gps first timestamp of VC5 file into a variable
#append file data to main dataframe
df = df.append({'Date' : filedate_df, 'Dump' : filedump_df, 'VC5 Start' : VC5gpsstart},ignore_index = True)
- I am searching a second set of files (call them set2) for the same info however the start time will be VC2time.
for vc2files in VC2filelist:
print(vc2files)
vc2filedate_df=vc2files[18:26] #save date of file to a variable
vc2filedump_df=vc2files[31:33] #save dump number of file to a variable
print(vc2filedate_df+':'+vc2filedump_df)
dsvc2=netCDF4.Dataset(vc2files, 'r') #read each netcdf file into a data set
VC2gpstart=dsvc2.variables['time'][0] # save gps first timestamp of VC5 file into a variable
VC2df = VC2df.append({'Date' : vc2filedate_df, 'Dump' : vc2filedump_df, 'VC2 Start' : VC2gpsstart},ignore_index = True)
I want to append/insert the VC2time data into the last column(VC2 Start) and use the date and dump numbers of the second set of files to designate where in the dataframe the starttime should go. example
Date Dump vc5start vc2start
2022.001 05 121651215 ***456447156***
the bold and italic data is the only thing i cannot produce right now. I have been trying a find the correct row to insert my data with
row=df.index.get_loc(df.query('Date' == vc2filedate_df) and ('Dump'==vc2filedump_df).index[0])
to no avail. my next step was to be
df.loc[row:'VC2 Start']=VC2gpsstart
what I want to know is
A: given the date and dump number of the file from my set2 , how do I find the row of the dataframe with the same date and dump number?
B: how do I then add the VC2 start data into the VC2 start column of the data frame on the row found in question A?
@Larrybird
VC5df VC2df
Date Dump VC5time Date Dump VC2time
2022.001 01 125 2022.001 01 125
2022.001 02 128 2022.001 02 130
2022.001 05 260 2022.001 05 261
2022.002 01 035 2022.002 01 035
@LarryBird, I after researching merge I found the (a) solution
creating datframes
VC5df=pd.DataFrame(columns=['Date','Dump','VC5 Start'])
VC2df=pd.DataFrame(columns=['Date','Dump','VC2 Start'])
appending data to them within loops (as above), then using
merged_df=pd.merge(VC5df,VC2df,on=["Date","Dump"])
creates the following (looking at first and second day of 2022)
Date Dump VC5 Start VC2 Start
0 2022.001 01 1325029429.0 1325029440.0
1 2022.001 02 1325030705.0 1325030760.0
2 2022.001 03 1325034031.0 1325034060.0
3 2022.001 04 1325035511.0 1325035560.0
4 2022.001 05 1325036791.0 1325036879.0
.. ... ... ... ...
103 2022.002 48 1325188946.0 1325188980.0
104 2022.002 49 1325191628.0 1325191680.0
105 2022.002 50 1325192627.0 1325192640.0
106 2022.002 51 1325195052.0 1325195100.0
107 2022.002 52 1325198890.0 1325198940.0