2

Problem:

I'd like to take Series 1 and Series 2 and create a new Series with values (series 1, Series 2). Essentially, I have two pandas series that I would like to combine into one. Although the values are represented as ints they are factors.

Ex.

Series 1   Series 2        Series 3
  1            2      ---   (1,2)
  2            3      to    (2,3)
  3            4      ---   (3,4)

What I've tried

pandas: combine two columns in a DataFrame

The pandas functions:

concat, merge, join

So far I've only been able to combine the values, (ie. add the elements together, append the series to each other, or merge based on values). Because the dataset is large, I'm looking to avoid loops. Although thats the only way I can think to do it so far. I feel like this should be pretty easy to accomplish with the power of pandas.

Any ideas? thanks for taking a look!

Community
  • 1
  • 1
agconti
  • 17,780
  • 15
  • 80
  • 114

1 Answers1

3

What are you going to do with this?

In [1]: s1 = Series([1,2,3])

In [2]: s2 = Series([2,3,4])

In [4]: Series(zip(s1,s2))
Out[4]: 
0    (1, 2)
1    (2, 3)
2    (3, 4)
dtype: object

Here's an idea, not sure if its suited for what you want...maybe

In [70]: s = Series([1,2,4,5,6])

A discrete quantizer (basically bins things, you can supply the bins if you want) produces a Categorical

In [71]: pd.qcut(s,2)
Out[71]: 
Categorical: 
array(['[1, 4]', '[1, 4]', '[1, 4]', '(4, 6]', '(4, 6]'], dtype=object)
Levels (2): Index(['[1, 4]', '(4, 6]'], dtype=object)

which you can then value_counts on

In [72]: pd.value_counts(pd.qcut(s,2))
Out[72]: 
[1, 4]    3
(4, 6]    2
dtype: int64
Jeff
  • 125,376
  • 21
  • 220
  • 187
  • Thanks @Jeff, +1 for elegance. I forgot about the zip function. If anyone else did too [here's a link](http://docs.python.org/2/library/functions.html#zip) to the documentation. – agconti Jul 11 '13 at 17:35
  • I'm analyzing trade data and there's two features importer and exporter. With them zipped like you showed me, I can now capture the relationship between the them. Ie. (US, UK) (US, CHINA) ect. – agconti Jul 11 '13 at 17:40
  • might be easier to put in separate columns in a frame, IMHO – Jeff Jul 11 '13 at 17:41
  • How could the relationship be captured then? Right now, I want to use it for convenience for plotting. for example : `df.sereies_with_zip.value_counts().plot(kind='bar')` to show the most frequent trade relationships, then `(df.groupby('year')).sereies_with_zip.value_counts().plot(kind='bar')` to show how this changes over time, – agconti Jul 11 '13 at 17:53
  • ok...that makes sense, you are treating the tuples as singular objects – Jeff Jul 11 '13 at 17:59
  • singular objects, but its not bad to be able to easily pull out import / exporter with [0] / [1] from the tuple. Do you know of a better way to capture relationships in Pandas? – agconti Jul 11 '13 at 18:25
  • Thats really interesting. Thanks for taking a look. Your first answer with zip is doing exactly what I want, but I will definitely explore this other option. This is a really interesting feature with pandas – agconti Jul 11 '13 at 19:03
  • great..if you find something interesting..pls post – Jeff Jul 11 '13 at 19:08