I'm struggling to correctly merge a few datasets in pandas. Let's say I've measured variables A, B, and C, at different times. Sometimes, I've got A and B at the same time, and sometimes not. I have three dataframes, where the dataframe's index is the time of measurement, and a column for the measurement. If I concatenate these dataframes, I get a bunch of NaNs where I have no measurements, maybe something like
idx | A | B | C
-----|-----|-----|----
0 | 1 | NaN | NaN
0 | NaN | 2 | 3
1 | 5 | 3 | NaN
In concatenating, I have non-unique time indices. What I'd like is to sort this by time, and collapse together rows with the same time index. The ideal result here is
idx | A | B | C
-----|-----|-----|----
0 | 1 | 2 | 3
1 | 5 | 3 | NaN
That would be the first scenario. To further complicate things, I may have a column, D, which specifies the location the measurement was taken. I'd thus need to allow this collapsing to keep non-unique indices as long as the entries in D are different for that time. Maybe we have
idx | A | B | C | D
-----|-----|-----|-----|-----
0 | 1 | NaN | NaN | Paris
0 | NaN | 2 | 3 | NYC
1 | 5 | 3 | NaN | NYC
1 | NaN | NaN | 0 | Paris
This dataframe cannot be collapsed any further, because, conditioned on D, it's already got unique times and information is as collapsed as possible.
I'm still trying to get my head around the various join / merge / concat operations and how they work, but I'd love a pointer or two.
Thank you !