0

I am taking over a project that is built in a pandas data frame where there is a large amount of measurements in this format: 6x6 , 52x14

I need to go in and add a quote (") inches unit symbol after each number in two specific columns that have this type of measurement data, the desired outcomes in the above examples would look like this 6"x6" , 52"x14"

How could I concisely write a code segment to add these quotes after each numeric value in those two columns? Another challenging piece is that there is other measurement data in these columns like the word large, small etc. but the only thing I am concerned with is adding the inch mark after each number.

smci
  • 32,567
  • 20
  • 113
  • 146
  • Do you want to do any numerical manipulations (e.g. calculate area) or just keep these dimensions as a string? Depending on your use-case, it might make more sense to split them into separate (numeric) columns `width, length`. – smci Jan 28 '21 at 00:32
  • *quote (")* is the inches unit symbol. (And ' is the unit symbol for 'feet'. Think Spinal Tap...) – smci Jan 28 '21 at 00:49

1 Answers1

1

Here's how to do the string replacement for units with a regex (but depending on your use-case, it might make more sense to split them into separate (numeric) columns width, length; see below):

import pandas as pd

df = pd.DataFrame({'measurements': ['6x6', '52x14']})

df['measurements'].str.replace(r'(\d+)', '\\1"')
0      6"x6"
1    52"x14"

whereas if you want separate (numeric) length, width columns:

df[['length','width']] = df['measurements'].str.partition('x')[[0,2]].astype(int)

  measurements length width
0          6x6      6     6
1        52x14     52    14

Separate numeric columns is way cleaner if you'll be doing any calculations (e.g. df['area'] = df.apply(lambda row: row['length']*row['width'], axis=1)).

You could then add your custom units formatting via:

Note:

  • in df[['length','width']] = df['measurements'].str.partition('x')[[0,2]].astype(int), we had to do the [[0,2]] subscripting to exclude the 'x' symbol itself that partition returned. Also we had to do .astype(int) to cast from string/pandas 'object' to int.
smci
  • 32,567
  • 20
  • 113
  • 146