0

I am trying split a column containing width, length and height into three seperate columns. Then convert the string values to integers. The column is part of 300 sized-feature dataset, with the particular series looking like this (last five entries):

index       W_X_L_X_H

53839     96 x 126 x 4
53840    20 x 623 x 13
53841        0 x 0 x 0
53842        0 x 0 x 0
53843        0 x 0 x 0
Name: DIMENSIONS(W_X_L_X_H), dtype: object

I have tried with different combinations of df["W_X_L_X_H"].str.split('x').apply() with mine own functions inside apply(), but without success.

Another question, if I may, is if this column could be represented in a sensible way, so to keep the WxLxH format? It should still be converted to an integer datatype - as I need it for numerical analysis - mostly for the correlation matrix. Any ideas are appreciated, as I am drowning in data cleaning.

Saurus
  • 79
  • 1
  • 7
  • 3
    [Pandas split column into multiple columns by comma](https://stackoverflow.com/q/37600711/15497888) you should just be able to add `expand=True`. `df["W_X_L_X_H"].str.split(' x ', expand=True).astype(int)` – Henry Ecker Oct 07 '21 at 21:22

1 Answers1

0

From the inspiration/link provided by Henry Ecker, in a related question, I tinkered and solved the issue with:

df[['WIDTH', 'LENGTH', 'HEIGHT']] = df['DIMENSIONS(W_X_L_X_H)'].str.split(' x ', expand=True)
Saurus
  • 79
  • 1
  • 7