0

If I want to find the difference between two consecutive rows in a pandas DataFrame, I can simply call the diff function.

I have rows that contain sets of characters. What I want to do now is compute the intersection of each set in rowise pairs. in other words, I'd like to use diff, but supply my own function instead. Is there a way to accomplish this in pandas?

example input:

 100118231     1               set([])           
               2            set([142.136.6])    
               3            set([142.136.6])    
               4            set([])             
               5            set([])             
               6            set([108.0.239])    

desired output:

 100118231     1               set([])             NaN
               2            set([142.136.6])    set([])
               3            set([142.136.6])    {142.136.6}
               4            set([])             set([])
               5            set([])             set([])
               6            set([108.0.239])    set([])

I've tried using shift, but it throws an error

In [213]: type(tgr.head(1))
Out[213]: pandas.core.frame.DataFrame

In [214]: tt=tgr.apply(lambda x: x['value'].intersection((x['value'].shift(-1))))

AttributeError: 'Series' object has no attribute 'intersection'
Mike
  • 397
  • 5
  • 19
  • As a side note, your pasted code and data is not really usable. Fortunately this was an easier question but if I had had to reproduce your dataframes I wouldn't have bothered. – U2EF1 Jul 17 '14 at 19:28
  • What's not useful? I thought it did a decent job of visualizing the structure of the data. if it doesn't make sense, I don't want to make the same mistake twice. – Mike Jul 17 '14 at 19:51
  • I can't reconstruct your data by just pasting something into the command line, so I can't reproduce your issue easily on this end. [Here's](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) a good guide, although you have to do more interesting things to get a dataframe of sets. – U2EF1 Jul 17 '14 at 20:11

1 Answers1

1

& will run over all the items, there's no need to involve lambdas and the like.

> df = pd.DataFrame(['hi', set([142,136,6]), set([142, 137, 6]), set([0, 6])]).iloc[1:]
> df & df.shift(1)
               0
1            NaN
2  set([142, 6])
3       set([6])
U2EF1
  • 12,907
  • 3
  • 35
  • 37