12

I have problem merging several time series to a common DataFrame. The example code I'm using:

import pandas
import datetime
import numpy as np

start = datetime.datetime(2001, 1, 1)
end = datetime.datetime(2001, 1, 10)
dates = pandas.date_range(start, end)
serie_1 = pandas.Series(np.random.randn(10), index = dates)
start = datetime.datetime(2001, 1, 2)
end = datetime.datetime(2001, 1, 11)
dates = pandas.date_range(start, end)
serie_2 = pandas.Series(np.random.randn(10), index = dates)
start = datetime.datetime(2001, 1, 3)
end = datetime.datetime(2001, 1, 12)
dates = pandas.date_range(start, end)
serie_3 = pandas.Series(np.random.randn(10), index = dates)

print 'serie_1'
print serie_1
print 'serie_2'
print serie_2
print 'serie_3'
print serie_3

serie_4 = pandas.concat([serie_1,serie_2], join='outer', axis = 1)
print 'serie_4'
print serie_4
serie_5 = pandas.concat([serie_4, serie_3], join='outer', axis = 1)
print 'serie_5'
print serie_5

This gives me the error for serie_5 (the second concat):

Traceback (most recent call last):
  File "C:\Users\User\Workspaces\Python\Source\TestingPandas.py", line 29, in <module>
    serie_5 = pandas.concat([serie_4, serie_3], join='outer', axis = 1)
  File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 878, in concat
    verify_integrity=verify_integrity)
  File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 948, in __init__
    self.new_axes = self._get_new_axes()
  File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 1101, in _get_new_axes
    new_axes[i] = self._get_comb_axis(i)
  File "C:\Python27\lib\site-packages\pandas\tools\merge.py", line 1125, in _get_comb_axis
    all_indexes = [x._data.axes[i] for x in self.objs]
AttributeError: 'TimeSeries' object has no attribute '_data'

I would like the result to look something like this (with random values in column 2):

                 0         1         2
2001-01-01 -1.224602       NaN       NaN
2001-01-02 -1.747710 -2.618369       NaN
2001-01-03 -0.608578 -0.030674 -1.335857
2001-01-04  1.503808 -0.050492  1.086147
2001-01-05  0.593152  0.834805 -1.310452
2001-01-06 -0.156984  0.208565 -0.972561
2001-01-07  0.650264 -0.340086  1.562101
2001-01-08 -0.063765 -0.250005 -0.508458
2001-01-09 -1.092656 -1.589261 -0.481741
2001-01-10  0.640306  0.333527 -0.111668
2001-01-11       NaN -1.159637  0.110722
2001-01-12       NaN       NaN -0.409387

What is wrong? As I said, probablybasic but I can not figure it out and I'm a beginner...

Jonas
  • 617
  • 3
  • 8
  • 22

1 Answers1

17

Concatenating a list of Series returns a DataFrame. Thus, serie_4 is a DataFrame. serie_3 is a Series. Concatenating a DataFrame with a Series raises the exception.

You could use

import pandas as pd
serie_5 = pd.concat([serie_1, serie_2, serie_3], join='outer', axis=1)

instead.


For example,

import functools
import numpy as np
import pandas as pd

s1 = pd.Series([0,1], index=list('AB'))
s2 = pd.Series([2,3], index=list('AC'))

result = pd.concat([s1, s2], join='outer', axis=1, sort=False)
print(result)

yields

     0    1
A  0.0  2.0
B  1.0  NaN
C  NaN  3.0

Note that you'll get a ValueError if you try to concatenate a series with a non-unique index. For example,

s3 = pd.Series([0,1], index=list('AB'), name='s3')
s4 = pd.Series([2,3], index=list('AA'), name='s4') # <-- non-unique index
result = pd.concat([s3, s4], join='outer', axis=1, sort=False)

raises

ValueError: cannot reindex from a duplicate axis

To work around this, reset the index and merge DataFrames instead:

import functools   
s3 = pd.Series([0,1], index=list('AB'), name='s3')
s4 = pd.Series([2,3], index=list('AA'), name='s4') # <-- non-unique index

result = functools.reduce(
    lambda left,right: pd.merge(left,right,on='index',how='outer'), 
    [s.reset_index() for s in [s3,s4]])
print(result)

yields

  index  s3   s4
0     A   0  2.0
1     A   0  3.0
2     B   1  NaN
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Okay, then I undestand why I get this error. I also tested to concatenate a DataFrame with another DataFrame by change the code to this: serie_5 = pandas.concat([serie_4, pandas.DataFrame(serie_3)], join='outer', axis = 1). This means that I can concatinate two Series to a Dataframe and then this DataFrame with another DataFrame. I need to find a generic solution where I can add a number of Series in a loop and I do not the number on beforehand. – Jonas Dec 05 '12 at 18:24
  • just create a Python list and append your Series into it and then provide it to pandas.concat as @unutbu was writing above. – K.-Michael Aye Dec 05 '12 at 19:09
  • The use of join looks generic enough! I changed it to "serie_5 = serie_4.join(serie_3, how = 'outer')" in order to ínclude 2012-01-12 in the example above. The reason I want to get a generic solution is that I want to combine several of different time series where there will be missing data and use Pandas functionality to handle the missing data. Thanks! – Jonas Dec 05 '12 at 19:59
  • why does the commands `serie_5 = pandas.concat([serie_1, serie_2, serie_3], join='outer', axis = 1)` not work for my series? It returns the error: _cannot reindex from a duplicate axis_ Actually mySeries, like series used here, are both _pandas.core.series.Series_, but mine has also second specifications. Do you suggest to ask another question? – SPS Jun 13 '18 at 10:12
  • @SPS: At least one of the Series has a non-unique index. `pd.concat` raises `ValueError: cannot reindex from a duplicate axis` in this case. To work around this, convert each series to a DataFrame (e.g. `s = s.reset_index()`) and then outer [merge the DataFrames](https://stackoverflow.com/a/30512931/190597) on the `index` column: (e.g. `functools.reduce(lambda left,right: pd.merge(left,right,on='index',how='outer'), [s.reset_index() for s in [serie_1, serier_2, serie_3]])`). – unutbu Jun 13 '18 at 11:54
  • @SPS: I've edited the post above to include some runnable code which shows what I mean. – unutbu Jun 13 '18 at 12:06