Convert several units to TB as well as perform calculation using Python

Question

I have a dataset, df, where I wish to convert several columns from bytes to TB and MB to TB.

Free                    Total
30,000,000,000,000.00   40,000,000
40,000,000,000,000.00   50,000,000

Bytes to TB - divide by 1024/1024/1024/1024 Megabytes to TB - divide by 1024/1024

Desired Output

Free    Total   Used
30      40      10
40      50      10

This is what I am doing

import pandas as pd
import numpy as np

df = pd.read_csv("df.csv")

df['Free'] = df['Free'].astype(str).str.replace(',','').astype(float).div(1000000000000)
df['Total'] = df['Total'].astype(str).str.replace(',','').astype(float).div(1000000)
df['Used'] = df['Total'] - df['Free']

My code above is not retaining the original dataset nor is it giving me my desired output. Any suggestion is appreciated.

TiB is 1024^4 bytes. TB is 1000^4 bytes. Which one do you mean? — Bill Huang, Dec 01 '20 at 19:14
Conversion is to TB- Ok I usually do conversions in Excel, and we use 1024 for converting to TeraBytes — Lynn, Dec 01 '20 at 19:15

score 1 · Accepted Answer · answered Dec 01 '20 at 19:19

1

Borrowing from this answer using atof() to avoid reinventing the wheel:

from locale import atof, setlocale, LC_NUMERIC
setlocale(LC_NUMERIC, '')
# 'en_US.UTF-8'

df["Free_TB"] = df["Free"].apply(atof).div(1e12)
df["Total_TB"] = df["Total"].apply(atof).div(1e6)
df["Used_TB"] = df["Total_TB"] - df["Free_TB"]

Result

print(df)
                    Free       Total  Free_TB  Total_TB  Used_TB
0  30,000,000,000,000.00  40,000,000     30.0      40.0     10.0
1  40,000,000,000,000.00  50,000,000     40.0      50.0     10.0

Notes:

If you want to retain the original dataset, new names must be assigned to avoid overwriting.
If you want powers of 1024^N, replace 1e12 with 2**40 and 1e6 with 2**20. Such new columns are better named with suffix _TiB instead of _TB.

answered Dec 01 '20 at 19:19

Bill Huang

4,491
2
13
31

Hi @Bill Huang. I get this error: AttributeError: 'float' object has no attribute 'replace' – Lynn Dec 01 '20 at 22:50
Could I still keep the nans and just target the numerical values? thanks – Lynn Dec 01 '20 at 23:05
1

I guess preprocessing with `df.fillna("0")` can do the work. A 0-volumed disk is semantically the same as nan-volumed, so this is unlikely to introduce ambiguity. – Bill Huang Dec 01 '20 at 23:09
ok thank you - I applied the df.fillna("0") so it removed the nans, but I still get: AttributeError: 'float' object has no attribute 'replace' – Lynn Dec 01 '20 at 23:13
1

I cannot debug what I don't have. You may post a separate question with the data attached ;) – Bill Huang Dec 01 '20 at 23:38
Ok @Bill Huang. Actually I just ran my original code above and it works without having to remove Nans or change anything. – Lynn Dec 01 '20 at 23:46

Convert several units to TB as well as perform calculation using Python

1 Answers1

Result