I have a table (pandas dataframe) as follows.
id | y | val1 | val2 | val3 | ... |
---|---|---|---|---|---|
1 | 100 | 3 | 1 | 2 | |
1 | 150 | 1 | 2 | 4 | |
1 | 250 | 4 | 2 | 6 | |
2 | 200 | 3 | 1 | 4 | |
2 | 250 | 2 | 2 | 2 | |
2 | 350 | 4 | 2 | 4 | |
3 | 200 | 3 | 3 | 4 | |
3 | 300 | 3 | 2 | 4 | |
3 | 400 | 6 | 3 | 3 |
I want to aggregate by the ID and concatenate all the values in order, and then take only the last y of each id. For example, the table would look like this:
id | y | val1 | val2 | val3 | ... | val1 | val2 | val3 | ... | val1 | val2 | val3 | ... |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 250 | 3 | 1 | 2 | ... | 1 | 2 | 4 | ... | 4 | 2 | 6 | ... |
2 | 350 | 3 | 1 | 4 | ... | 2 | 2 | 2 | ... | 4 | 2 | 4 | ... |
3 | 400 | 3 | 3 | 4 | ... | 3 | 2 | 4 | ... | 6 | 3 | 3 | ... |
Notice the y, 250 is from the last row of id=1 (in the real table there is another id I can use to specify what y I want to keep), 350 is from the last row of id=2 and 400 is from the last row of id=3. The values are simply concatenated into one row in order.
I looked into pivot_table and know I can use new_ds = pd.pivot_table(dataset, index='id')
to aggregate by the ID, but I want to be selective (not including y for all rows, and in reality there's other garbage data I do not want). It is important that the values have to be in order, so the values from the second row come after the first, third comes the second etc. There's more than 100 values.
I've looked into pivot and groupby but can't figure out exactly how to apply to this.