1

I have a pandas DataFrame with 3 columns (Product, Color and Size) and I want to create a fourth column called Description that concatenate the strings on previous 3 columns. Dataframe example can be shown below:

data = {"Product": ["Shorts", "T-Shirt", "Jacket", "Cap"],
        "Color": ["Red", "Blue", "White", "Green"],
        "Size": ["S", "M", None, "S"]}

df = pd.DataFrame(data)
df

|Product |Color |Size |
|--------|------|-----|
|Shorts  |Red   |S    |
|T-Shirt |Blue  |M    |
|Jacket  |White |None |
|Cap     |Green |S    |

I tried using numpy.where as follow with no success:

df["Description"] = np.where(df["Size"].isna(), str(df["Product"])+" "+str(df["Color"]), str(df["Product"])+" "+str(df["Color"])+" "+str(df["Size"]) )

I could do what I want iterating my dataframe with iterrows as below but I want to do the same using pandas or numpy to have a better performance.

for index, row in df.iterrows():
if row["Size"] is None:
    df.loc[index, "Description"] = str(row["Product"])+" "+str(row["Color"])
else:
    df.loc[index, "Description"] = str(row["Product"])+" "+str(row["Color"])+" "+str(row["Size"])

df
|Product |Color |Size |Description   |
|--------|------|-----|--------------|
|Shorts  |Red   |S    |Shorts Red S  |
|T-Shirt |Blue  |M    |T-Shirt Blue M|
|Jacket  |White |None |Jacket White  |
|Cap     |Green |S    |Cap Green S   |

Any guess how to do the same thing using pandas or numpy?

danimille
  • 350
  • 1
  • 12

0 Answers0