1

I have a dataframe like this:

data = {'col1':['XXX', 'AAA', 'ZZZ'],'col2':['BBB', 'AAA','TTT'], 'col3': ['BBB', 'CCC', 'TTT'], 'col4': ['XXX', 'CCC', 'ZZZ']} 

df = pd.DataFrame(data)
df

enter image description here

And I want to produce a column that joins the strings together, but without automatically alphabetising it: enter image description here I want it to use col1 as the first portion of the combo.

However, I have run this code, and am getting an output that prioritises the alphabetical order - which I don't want. I want it to use the order stipulated in the code

df['combos'] = ["_".join((k for k in set(v) if pd.notnull(k))) for v in
                  df[["col1", "col2", "col3", "col4"]].values]
df

enter image description here

  • does this answer your question: https://stackoverflow.com/questions/19377969/combine-two-columns-of-text-in-pandas-dataframe – Haleemur Ali Jan 21 '22 at 13:41

1 Answers1

3

Use dict.fromkey trick for original ordering:

df['combos'] = ["_".join(dict.fromkeys(k for k in v if pd.notnull(k))) for v in
                  df[["col1", "col2", "col3", "col4"]].values]
print (df)
  col1 col2 col3 col4   combos
0  XXX  BBB  BBB  XXX  XXX_BBB
1  AAA  AAA  CCC  CCC  AAA_CCC
2  ZZZ  TTT  TTT  ZZZ  ZZZ_TTT

If no missing values:

df['combos'] = ["_".join(dict.fromkeys(v)) for v in
                  df[["col1", "col2", "col3", "col4"]].values]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252