Combining multiple time series into a one dataframe

Question

My task is to combine many time series into a one dataset. The data comes in chunks: time series with date and time index, different names, but overlapping date and time. When merging, I get a dataset with duplicate lines.

My code:

>>> import pandas as pd
>>> s1 = pd.Series([1.1,1.2], index=pd.date_range("2000-01-01 00:00:00", freq="S", periods=2), name="id_1")
>>> s2 = pd.Series([1.3,1.4], index=pd.date_range("2000-01-01 00:00:02", freq="S", periods=2), name="id_1")
>>> s3 = pd.Series([2.1,2.2], index=pd.date_range("2000-01-01 00:00:00", freq="S", periods=2), name="id_2")
>>> s4 = pd.Series([2.3,2.4], index=pd.date_range("2000-01-01 00:00:02", freq="S", periods=2), name="id_2")
>>> df = pd.DataFrame()
>>> df.append([s1,s2,s3,s4])
      2000-01-01 00:00:00  2000-01-01 00:00:01  2000-01-01 00:00:02  2000-01-01 00:00:03
id_1                  1.1                  1.2                  NaN                  NaN
id_1                  NaN                  NaN                  1.3                  1.4
id_2                  2.1                  2.2                  NaN                  NaN
id_2                  NaN                  NaN                  2.3                  2.4

I want the dataset to be like this:

      2000-01-01 00:00:00  2000-01-01 00:00:01  2000-01-01 00:00:02  2000-01-01 00:00:03
id_1                  1.1                  1.2                  1.3                  1.4
id_2                  2.1                  2.2                  2.3                  2.4

check out if this works https://stackoverflow.com/questions/46235097/remove-nan-values-from-pandas-dataframe-and-reshape-table — sammy, Mar 17 '21 at 05:15

piRSquared · Answer 1 · 2021-03-17T05:25:07.200

`functools.reduce`

/shrug first thing that came to mind

from functools import reduce

reduce(pd.DataFrame.combine_first, map(pd.Series.to_frame, [s1, s2, s3, s4])).T

      2000-01-01 00:00:00  2000-01-01 00:00:01  2000-01-01 00:00:02  2000-01-01 00:00:03
id_1                  1.1                  1.2                  1.3                  1.4
id_2                  2.1                  2.2                  2.3                  2.4

Reconstruct data

dat = {}
for s in [s1, s2, s3, s4]:
    for k, v in s.iteritems():
        dat.setdefault(k, {})[s.name] = v

pd.DataFrame(dat)

      2000-01-01 00:00:00  2000-01-01 00:00:01  2000-01-01 00:00:02  2000-01-01 00:00:03
id_1                  1.1                  1.2                  1.3                  1.4
id_2                  2.1                  2.2                  2.3                  2.4

Combining multiple time series into a one dataframe

1 Answers1

functools.reduce

Reconstruct data

`functools.reduce`