0

I have a dataset, df, where I wish to convert several columns from bytes to TB and MB to TB.

Free                    Total
30,000,000,000,000.00   40,000,000
40,000,000,000,000.00   50,000,000

Bytes to TB - divide by 1024/1024/1024/1024 Megabytes to TB - divide by 1024/1024

Desired Output

Free    Total   Used
30      40      10
40      50      10

This is what I am doing

import pandas as pd
import numpy as np

df = pd.read_csv("df.csv")

df['Free'] = df['Free'].astype(str).str.replace(',','').astype(float).div(1000000000000)
df['Total'] = df['Total'].astype(str).str.replace(',','').astype(float).div(1000000)
df['Used'] = df['Total'] - df['Free']

My code above is not retaining the original dataset nor is it giving me my desired output. Any suggestion is appreciated.

Lynn
  • 4,292
  • 5
  • 21
  • 44

1 Answers1

1

Borrowing from this answer using atof() to avoid reinventing the wheel:

from locale import atof, setlocale, LC_NUMERIC
setlocale(LC_NUMERIC, '')
# 'en_US.UTF-8'

df["Free_TB"] = df["Free"].apply(atof).div(1e12)
df["Total_TB"] = df["Total"].apply(atof).div(1e6)
df["Used_TB"] = df["Total_TB"] - df["Free_TB"]

Result

print(df)
                    Free       Total  Free_TB  Total_TB  Used_TB
0  30,000,000,000,000.00  40,000,000     30.0      40.0     10.0
1  40,000,000,000,000.00  50,000,000     40.0      50.0     10.0

Notes:

  • If you want to retain the original dataset, new names must be assigned to avoid overwriting.

  • If you want powers of 1024^N, replace 1e12 with 2**40 and 1e6 with 2**20. Such new columns are better named with suffix _TiB instead of _TB.

Bill Huang
  • 4,491
  • 2
  • 13
  • 31
  • Hi @Bill Huang. I get this error: AttributeError: 'float' object has no attribute 'replace' – Lynn Dec 01 '20 at 22:50
  • Could I still keep the nans and just target the numerical values? thanks – Lynn Dec 01 '20 at 23:05
  • 1
    I guess preprocessing with `df.fillna("0")` can do the work. A 0-volumed disk is semantically the same as nan-volumed, so this is unlikely to introduce ambiguity. – Bill Huang Dec 01 '20 at 23:09
  • ok thank you - I applied the df.fillna("0") so it removed the nans, but I still get: AttributeError: 'float' object has no attribute 'replace' – Lynn Dec 01 '20 at 23:13
  • 1
    I cannot debug what I don't have. You may post a separate question with the data attached ;) – Bill Huang Dec 01 '20 at 23:38
  • Ok @Bill Huang. Actually I just ran my original code above and it works without having to remove Nans or change anything. – Lynn Dec 01 '20 at 23:46