0

I have a column (called codes) with different repeated values separated by comma, I have to create as many columns as the unique values are and, for each row, put 1 in the column of the value present in the cell of the Codes column and 0 in all the others columns just created.

Codes
40.1, 54.06, 11.3
50.64
48.7, 40.1, 54.06
35.2, 50.64

I inserted the values on a list but I don't know how to build the columns and above all to write 1 or 0 in the columns. I have to get

final table results

  • What is ```cplumn```? Also, can you format your output properly, please, so it's easier to see? Thanks. – AnandP2812 Mar 19 '23 at 21:00
  • Hi, I corrected the post and the table should now be clearer what I need – plambertini Mar 19 '23 at 21:32
  • Does this answer your question? [Converting pandas column of comma-separated strings into dummy variables](https://stackoverflow.com/questions/46867201/converting-pandas-column-of-comma-separated-strings-into-dummy-variables) – AlexK Mar 19 '23 at 22:15

2 Answers2

1

try this:

result = df.join(df.Codes.str.get_dummies(sep=', '))
print(result)
>>>
               Codes  11.3  35.2  40.1  48.7  50.64  54.06
0  40.1, 54.06, 11.3     1     0     1     0      0      1
1              50.64     0     0     0     0      1      0
2  48.7, 40.1, 54.06     0     0     1     1      0      1
3        35.2, 50.64     0     1     0     0      1      0
ziying35
  • 1,190
  • 3
  • 6
  • Hi! thank you!! it works perfectly! but the column "codes" belongs to an xls file, how can I create these columns in my xls file? – plambertini Mar 21 '23 at 21:22
  • write the dataframe to an excel file: df.to_excel('my_file.xlsx', sheet_name='Sheet1', index=False) – ziying35 Mar 22 '23 at 00:23
  • thank you very much!! I need one last advice: my excel file uses 25 columns, the codes column is the 17th (col. Q), how can I insert the new columns you taught me starting from column 18 (col. R) by scrolling all the others? thank you in advance – plambertini Mar 23 '23 at 20:37
  • try: df.iloc[:, :16].join(df.Codes.str.get_dummies(sep=', ')).join(df.iloc[:, 17:]) – ziying35 Mar 24 '23 at 00:32
0

Another solution:

out = df['Codes'].str.extractall(r'([^\s,]+)').droplevel(level=1)
out = out.pivot_table(index=out.index, columns=out[0], aggfunc='size', fill_value=0)
out = pd.concat([df, out], axis=1)

print(out)

Prints:

               Codes  11.3  35.2  40.1  48.7  50.64  54.06
0  40.1, 54.06, 11.3     1     0     1     0      0      1
1              50.64     0     0     0     0      1      0
2  48.7, 40.1, 54.06     0     0     1     1      0      1
3        35.2, 50.64     0     1     0     0      1      0
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91