0

I have a large CSV file, consisting of measurements, with over 75000 rows and over 200 columns. Each column has a locationId, a type of measurement (type A or B) and columns corresponding to the time and date it was measured with hourly intervals:

   locationId  type   2021-02-09 22  ...  2021-03-11 05  2021-03-11 06
0      60       A         NaN        ...        NaN          NaN
1      42       B         NaN        ...        NaN          NaN
2      58       A         NaN        ...        NaN          NaN
3      67       A         NaN        ...        NaN          NaN
4      53       B         NaN        ...        NaN          NaN

Most of the date entries in each row have NaN, but at least 1 column in each row has a valid numeric entry. I need to extract the first value that is NOT NaN in each row and present the result as:

   locationId  type   2021-0X-XX XX  
0      60       A         2.45        
1      42       B         5.32       
2      58       A         2.33        
3      67       A         2.21        
4      53       B         3.93        

I've tried using Dataframe manipulation with iloc and loc, but these are mostly for manipulating values in same columns. I'm fairly new to python, someone told me dataframes would be a place to start, but this might be wrong I dont know..?My end result is to export a csv file with only 1 date column like above.

naits360
  • 1
  • 1
  • `df.set_index(['locationID','type']).stack()` should do the trick. or `pd.melt(df,id_vars=['locationId','type']).dropna()` – Umar.H Mar 21 '21 at 22:52
  • Also look at [get-first-non-null-value-per-row](https://stackoverflow.com/questions/50004529/get-first-non-null-value-per-row) – Joe Ferndz Mar 21 '21 at 23:32

0 Answers0