0

I wonder if there is a way to join between strings from two columns of a data frame as I referred to this: Combine two columns of text in pandas dataframe But it seems like I cannot find the solution that satisfied my need. Suppose I have a data frame with two columns: First and Second. So the Third column should be sth like this:

df['Third'] = df.First.astype(str).str.cat(df.Second.astype(str), sep=',')

Normally, the concatenation works well when those 2 columns are string but in my case, the first and the second column can sometimes contain NaN

So Is there a way to set the condition on sep whether it needs to put or not in case for example: if df['First']= 'first' and df['Second'] = NaN then df['Third'] = 'first' without a , at the end. if df['first'] = NaN and df['Second'] = 'second' then df['Third'] = 'second' without a , at the beginning of the string.

How can I achieve this? Any help would be much appreciated. Thank you!

Erwin
  • 325
  • 1
  • 9

2 Answers2

1

Not sure how efficient this is, or if there's a better way.

If first and second are the only columns in the data frame:

df["third"] = df.apply(
    lambda row: ",".join([x for x in row if not pd.isna(x)]),
    axis=1
)

If there are other columns:

df["third"] = df.apply(
    lambda row: ",".join([x for x in row[["first", "second"]] if not pd.isna(x)]),
    axis=1
)
norie
  • 9,609
  • 2
  • 11
  • 18
  • Can you specify which column to apply if we have the other columns than First and Second? – Erwin Jun 16 '21 at 12:48
0

Lambda and apply gets messy, so to make it clear you may want to just abstract out a function:

def my_join(a,b):
    if a != NaN and b != NaN: return a + ',' + b
    elif a == NaN return b
    else return a

df['Third'] = df.apply(lambda x: my_join(x.First, x.Second))
Chris Schmitz
  • 618
  • 1
  • 6
  • 16