1

I have a pandas dataframe that looks like this:

ImageID labels caption_text
0.JPG 1 Woman in swim suit holding parasol
1.JPEG 1 19 a black and silver clock tower
2.JPEG 8 3 13 This photo shows people skiing in the mountains.

The labels for this data set range from 1 to 19 and I am trying to allocate them to their own column. The final dataframe will have an additional 19 columns with a 1 or 0.

For example, "8 3 13" will have a 1 in columns 8, 3 and 13 and 0's everywhere else.

So far I have managed to put them into arrays and managed to put them into columns, but neither of these gives me what I need.

Any ideas on how I can achieve this?

Thanks!

Rfinch
  • 29
  • 3
  • 2
    Welcome to SO! Please share a [reproducible code snippet of your dataframe](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) along with your code attempt as a [mcve]. Thanks. – ggorlen May 23 '21 at 06:53

2 Answers2

5

Since you already know the range to be 1-19, you can do a get_dummies and reindex:

n=19
arr = df['labels'].str.get_dummies(' ').reindex(map(str,range(1,n+1)),axis=1,fill_value=0)
print(arr)

   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19
0  1  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
1  1  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   1
2  0  0  1  0  0  0  0  1  0   0   0   0   1   0   0   0   0   0   0

Finally you can concat this with the original dataframe:

out = pd.concat((df,arr),axis=1)
anky
  • 74,114
  • 11
  • 41
  • 70
1

Just to offer an alternative way of doing this. You can iterate over each of your labels and see if the value in labels contains that label:

n = 19
for i in range(1, n+1):
    df[i] = df['labels'].str.contains(rf'\b{i}\b').astype(int)
Nick
  • 138,499
  • 22
  • 57
  • 95