I'm really unsure about how to express my problem and thus phrase my questions. Surely this is a widely known issue, but I couldn't find how to solve it.
I have a dataframe like this
id time value
0 100 2012-04-01 2.3
1 100 2012-04-02 3.4
2 100 2012-04-03 1.4
3 100 2012-04-04 5.6
4 200 2012-04-01 3.0
5 200 2012-04-02 2.9
6 200 2012-04-03 2.8
7 200 2012-04-04 1.9
8 300 2012-04-01 3.1
9 300 2012-04-02 4.2
10 300 2012-04-03 2.5
11 300 2012-04-04 1.2
Where I have a list of unique ids that define some time series. In this case I have 3 time series, i.e. timeseries 100, timeseries 200, timeseries 300. Each of these timeseries has got one value for each date. (Notice that they all start at 2012-04-01 and all end at 2012-04-04 so they are all over the same period).
What I want to do is have a fast way to transform the above into something like
time 100 200 300
0 2012-04-01 2.3 3.0 3.1
1 2012-04-02 3.4 2.9 4.2
2 2012-04-03 1.4 2.8 2.5
3 2012-04-04 5.6 1.9 1.2
Or even better with the time as index
time 100 200 300
2012-04-01 2.3 3.0 3.1
2012-04-02 3.4 2.9 4.2
2012-04-03 1.4 2.8 2.5
2012-04-04 5.6 1.9 1.2
MY TRY
given that the dataframe is called df
.
list_dfs = []
for i in somedf['id'].unique():
list_dfs.append(somedf[somedf['id'] == i].set_index('time').drop('id', axis = 1).rename(columns={'value':i}))
By doing this then I can just use pd.concat
, but the problem with that is that it messes up the index
pd.concat(list_dfs)
and I get
100 200 300
time
2012-04-01 2.3 NaN NaN
2012-04-02 3.4 NaN NaN
2012-04-03 1.4 NaN NaN
2012-04-04 5.6 NaN NaN
2012-04-01 NaN 3.0 NaN
2012-04-02 NaN 2.9 NaN
2012-04-03 NaN 2.8 NaN
2012-04-04 NaN 1.9 NaN
2012-04-01 NaN NaN 3.1
2012-04-02 NaN NaN 4.2
2012-04-03 NaN NaN 2.5
2012-04-04 NaN NaN 1.2
Code to generate the df for the MWE
df = pd.DataFrame({'id':[100,100,100,100,200,200,200,200,300,300,300,300], 'time':['2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04','2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04','2012-04-01', '2012-04-02', '2012-04-03', '2012-04-04'], 'value':[2.3,3.4, 1.4, 5.6, 3, 2.9, 2.8, 1.9, 3.1, 4.2,2.5, 1.2]})
Ideas
Notice that I don't necessarily need all that fancy staff. All I want is being able to have a matrix where I can have the time series side by side.
I think there must be a way using pandas groupby but I cannot seem to find it.