0

I have a column in a df with strings such as ABABAB, i would like to create a new column in such df that would bring just non-repeated characters of such strings, in the example above, just AB.

I have tried ''.join() but this does not work well with I get an error message saying that a string was expected.

Illustrative desired outcome:

  Column_1     Column_2
   ABABAB         AB
   KGKGKG         KG
   ACACAC         AC
   PCTPCTPCT      PCT

Please keep in mind there are situations in which the unique characters are more than just two

Thank you in Advance!

TPguru
  • 87
  • 8

2 Answers2

1

See if this is what you want:

df["Column_2"] = df["Column_1"].apply(lambda x: "".join(set(list(x))))
df
    Column_1    Column_2
0   ABABAB  AB
1   KGKGKG  KG
2   ACACAC  AC
3   PCTPCTPCT   PTC
Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72
  • does not work for `'ABBABB'` also: does not guarantee to keep letters in order as sets are not sorted and might return ACP for PCAPCAPCA - will alsno not work for `'ABCDEFE'` which has no repeat – Patrick Artner Jan 27 '19 at 14:50
  • `'PCT'` are all repeated characters. They occure trice in `'PCTPCTPCT'`- what is looked for is the shortest "sequence" that does not repeat - and your code will remove duplicate letters from sequences - hence won't work for `'ABBABBABB'` which probably should return `'ABB'` but will give either `'AB'` or `'BA'` – Patrick Artner Jan 27 '19 at 14:58
0

If you don't care about the order, you could use sets for this: they keep only unique items of an iterable (such as a string). Then join the set together to make it a string:

df['Column_1'].apply(lambda x: ''.join(set(x)))
#0    BA
#1    GK
#2    CA
#3    CTP
Jondiedoop
  • 3,303
  • 9
  • 24
  • does not work for `'ABBABB'` also: does not guarantee to keep letters in order as sets are not sorted and might return ACP for PCAPCAPCA - will alsno not work for `'ABCDEFE'` which has no repeat – Patrick Artner Jan 27 '19 at 14:50
  • Could you clarify? It outputs `'AB'`, as expected, right? – Jondiedoop Jan 27 '19 at 14:58
  • It appears to me your comment changed after I responded to it. I already wrote in my answer that it only works if you don't care about the order (which he didn't specify should matter). He says he wants non-repeated characters, which I interpret as unique characters, which is in fact what this returns, right? @PatrickArtner – Jondiedoop Jan 27 '19 at 15:03
  • I asked for clarification. I read the question differently - until I get clarification I'll redact my dv – Patrick Artner Jan 27 '19 at 15:04