Get unique strings after joining columns in Pandas Dataframe

Question

I have a dataframe like this:

data = {'col1':['XXX', 'AAA', 'ZZZ'],'col2':['BBB', 'AAA','TTT'], 'col3': ['BBB', 'CCC', 'TTT'], 'col4': ['XXX', 'CCC', 'ZZZ']} 

df = pd.DataFrame(data)
df

And I want to produce a column that joins the strings together, but without automatically alphabetising it: I want it to use col1 as the first portion of the combo.

However, I have run this code, and am getting an output that prioritises the alphabetical order - which I don't want. I want it to use the order stipulated in the code

df['combos'] = ["_".join((k for k in set(v) if pd.notnull(k))) for v in
                  df[["col1", "col2", "col3", "col4"]].values]
df

does this answer your question: https://stackoverflow.com/questions/19377969/combine-two-columns-of-text-in-pandas-dataframe — Haleemur Ali, Jan 21 '22 at 13:41

score 3 · Accepted Answer · answered Jan 21 '22 at 13:42

3

Use dict.fromkey trick for original ordering:

df['combos'] = ["_".join(dict.fromkeys(k for k in v if pd.notnull(k))) for v in
                  df[["col1", "col2", "col3", "col4"]].values]
print (df)
  col1 col2 col3 col4   combos
0  XXX  BBB  BBB  XXX  XXX_BBB
1  AAA  AAA  CCC  CCC  AAA_CCC
2  ZZZ  TTT  TTT  ZZZ  ZZZ_TTT

If no missing values:

df['combos'] = ["_".join(dict.fromkeys(v)) for v in
                  df[["col1", "col2", "col3", "col4"]].values]

answered Jan 21 '22 at 13:42

jezrael

822,522
95
1,334
1,252

1

YES! Thank you so much! I cannot believe I missed this! Really appreciate it. Have a great day! – elevenplusseven Jan 21 '22 at 14:31

Get unique strings after joining columns in Pandas Dataframe

1 Answers1