0

I have a dataframe with a column that has a string with comma separated items.

col1
apple, banana, kiwi
apple, banana
banana

I want to make a second column 'col2' that shows the difference between each row.

So I'm trying to turn each row into a set, and subtracting it from the previous row as referred to here: Python comparing two strings to differences

df['col2'] = set(df["col1"].shift(1)) - set(df["col1"])

However I get this error message: "ValueError: Length of values does not match length of index". What am I doing wrong and is there a better way to do what I'm doing?

EDIT: expected output

col1                           col2
apple, banana, kiwi             
apple, banana                  kiwi
banana                         apple

1 Answers1

0
df["temp"] = df.col1.str.replace("\s+", "").str.split(",")

Assign value to difference column:

df['difference'] = [    ""
                     if isinstance(last, float) or (not set(last).difference(first))
                     else tuple(set(last).difference(first))
                     if len(set(last).difference(first)) > 1
                     else min(set(last).difference(first))
                     for first, last in zip(df.temp, df["temp"].shift())
                  ]
df.drop('temp', axis=1)

    col1                   difference
0   apple, banana, kiwi 
1   apple, banana              kiwi
2   banana                    apple
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
  • I get a syntax error at outcome := pointing at the :. did you mean to write something else? –  Aug 15 '20 at 23:19
  • oh. that's the walrus operator. python 3.8 I believe. It makes it a bit easier to stop repeating words. I'll edit it if your version is < 3.8 – sammywemmy Aug 15 '20 at 23:21
  • now I get a "valueerror: min() arg is an empty sequence" at "for first, last in". –  Aug 15 '20 at 23:26
  • for the same data you shared? – sammywemmy Aug 15 '20 at 23:26
  • the data I shared here is a representative of the 5k+ data I have... am I getting this error because there might not be a difference in some rows? –  Aug 15 '20 at 23:29
  • and if there is no difference, then an empty string? – sammywemmy Aug 15 '20 at 23:31
  • yes correct. Thank you for taking the time to work with me on this. –  Aug 15 '20 at 23:31
  • that worked wonders, thank you so much for your help. I didn't realize this would get more complicated than what I had initially. –  Aug 15 '20 at 23:38