-4

I have data:

ID  merk_ac1  merk_ac2 merk_ac3
1   a         NA       NA
2   NA        b        NA
3   NA        a        b
4   a,b       NA       c

I tried using combine_first() but not cover the multiple values. I want the result like:

ID  merk_ac1  merk_ac2 merk_ac3 merk_ac
1   a         a                 a 
2             b                 b
3             a        b        a,b
4   a,b                c        a,b,c
Ririn
  • 97
  • 1
  • 7
  • Is this a [tag:pandas] dataframe? If so, please add the tag to your question – Pranav Hosangadi Jul 18 '22 at 14:27
  • Does this answer your question? [How to concatenate multiple column values into a single column in Pandas dataframe](https://stackoverflow.com/questions/39291499/how-to-concatenate-multiple-column-values-into-a-single-column-in-pandas-datafra) – GodWin1100 Jul 18 '22 at 14:51

2 Answers2

1

You can use apply to concatenate string from multiple columns using join.
For unique/distinct value/element you can transform list into set and then join it set will not respect sorting order if you want in sorted you can just do sorted(list(set())).

import pandas as pd

a = ["a", pd.NA, pd.NA, "a,b"]
b = ["a", "b", "a", pd.NA]
c = [pd.NA, pd.NA, "b", "c"]

df = pd.DataFrame({"merk_ac1": a, "merk_ac2": b, "merk_ac3": c})
df["combined"]=df.apply(lambda row: ",".join(set([elem for elem in row if not pd.isna(elem)])), axis=1)
df.head()

OUTPUT:

a b c combined
0 a a NA a
1 NA b NA b
2 NA a b a,b
3 a,b NA c c,a,b
GodWin1100
  • 1,380
  • 1
  • 6
  • 14
  • thank you. but if I want a distinct value in every row it is possible? for example the values of merk_ac1 and merk_ac2 are a. using your code the result will be a,a. I want only a. – Ririn Jul 18 '22 at 15:01
  • Check I have updated my answer. If this solves your doubt you can mark it as **Answer** and avoid updating question with new condition if your doubt is solved and try to ask question with proper condition at first. – GodWin1100 Jul 18 '22 at 19:01
0

What about something like:

import numpy as np
import panas as pd

df['merk_ac'] = df.apply(lambda x: [element for element in x if ~np.isnan(element)],axis = 1)

assuming that in merk_ac you want a list with all the elements from the different columns of the original data frame