I have a pandas DataFrame with 3 columns (Product
, Color
and Size
) and I want to create a fourth column called Description
that concatenate the strings on previous 3 columns.
Dataframe example can be shown below:
data = {"Product": ["Shorts", "T-Shirt", "Jacket", "Cap"],
"Color": ["Red", "Blue", "White", "Green"],
"Size": ["S", "M", None, "S"]}
df = pd.DataFrame(data)
df
|Product |Color |Size |
|--------|------|-----|
|Shorts |Red |S |
|T-Shirt |Blue |M |
|Jacket |White |None |
|Cap |Green |S |
I tried using numpy.where
as follow with no success:
df["Description"] = np.where(df["Size"].isna(), str(df["Product"])+" "+str(df["Color"]), str(df["Product"])+" "+str(df["Color"])+" "+str(df["Size"]) )
I could do what I want iterating my dataframe with iterrows
as below but I want to do the same using pandas or numpy to have a better performance.
for index, row in df.iterrows():
if row["Size"] is None:
df.loc[index, "Description"] = str(row["Product"])+" "+str(row["Color"])
else:
df.loc[index, "Description"] = str(row["Product"])+" "+str(row["Color"])+" "+str(row["Size"])
df
|Product |Color |Size |Description |
|--------|------|-----|--------------|
|Shorts |Red |S |Shorts Red S |
|T-Shirt |Blue |M |T-Shirt Blue M|
|Jacket |White |None |Jacket White |
|Cap |Green |S |Cap Green S |
Any guess how to do the same thing using pandas
or numpy
?