0

this is part of a column of my pandas dataframe describing size in megabyte(M) or kilobyte(k):

                      Size
9970                  3.6M
9971                  8.4M
9972                   31M
9973                   67M
9974    Varies with device
9975    Varies with device
9976                  714k
9977                   18M
9978    Varies with device
9979                  2.0M
9980                   15M
9981                   14M
9982                  8.3M
9983                   19M

Name: Size, dtype: object

I need to clean this data so that I strip the letters and keep just the floats with kilobytes converted to megabytes (1 megabytes = 1024 kilobytes).

This is what I have tried to do for stripping letter M, and that could be easily done for k as well.

df["Size_cleaned"] = df['Size'].apply(lambda x: x.rstrip("M"))

but then I don't really understand how to divide just the cells with size in kilobytes to get the values in megabytes.

The output should be something like this:

                      Size_cleaned
9970                  3.6
9971                  8.4
9972                   31
9973                   67
9974    Varies with device <- this can be NaN 
9975    Varies with device
9976                 0.69
9977                   18
9978    Varies with device
9979                  2.0
9980                   15
9981                   14
9982                  8.3
9983                   19
mozway
  • 194,879
  • 13
  • 39
  • 75
Marco B
  • 3
  • 2
  • `factors = {'k': 1024, 'M': 1} ; df2 = df['Size'].str.extract('(\d+\.*\d*)([kM])') ; df['Size_cleaned'] = pd.to_numeric(df2[0]).div(df2[1].map(factors)).round(3)` – mozway Nov 30 '22 at 13:34

0 Answers0