I tried these: https://stackoverflow.com/a/37683738/13865853, https://stackoverflow.com/a/50830098/13865853.
My dataframe is all strings but the dtype is object for reasons I read elsewhere on SO.
The columns are units of micronutrients in foods that look like this:
Life-Stage Group Arsenic Boron (mg/d) Calcium (mg/d) Chromium Copper (μg/d) \
0 <= 3.0 y nan g 3 mg 2500 mg nan g 1000 μg
1 <= 8.0 y nan g 6 mg 2500 mg nan g 3000 μg
Fluoride (mg/d) Iodine (μg/d) Iron (mg/d) Magnesium (mg/d) Manganese (mg/d) \
0 1.3 mg 200 μg 40 mg 65 mg 2 mg
1 2.2 mg 300 μg 40 mg 110 mg 3 mg
Molybdenum (μg/d) Nickel (mg/d) Phosphorus (g/d) Potassium Selenium (μg/d) \
0 300 μg 0.2 mg 3 g nan g 90 μg
1 600 μg 0.3 mg 3 g nan g 150 μg
Silicon Sulfate Vanadium (mg/d) Zinc (mg/d) Sodium Chloride (g/d) \
0 nan g nan g nan mg 7 mg nan g 2.3 g
1 nan g nan g nan mg 12 mg nan g 2.9 g
Vitamin A (μg/d) Vitamin C (mg/d) Vitamin D (μg/d) Vitamin E (mg/d) \
0 600.0 μg 400 mg 63.0 μg 200 mg
1 900.0 μg 650 mg 75.0 μg 300 mg
Vitamin K (μg/d) Thiamin (mg/d) Riboflavin (mg/d) Niacin (mg/d) \
0 nan μg nan mg nan mg 10 mg
1 nan μg nan mg nan mg 15 mg
Vitamin B6 (mg/d) Folate (μg/d) Vitamin B12 (μg/d) Pantothenic Acid (mg/d) \
0 30 mg 300 μg nan μg nan mg
1 40 mg 400 μg nan μg nan mg
Biotin (μg/d) Choline (mg/d) Carotenoids
0 nan μg 1.0 mg nan g
1 nan μg 1.0 mg nan g
I want to zero-out nan
and just get the numerical values as I want to multiply g
by 1000 and divide any ug
(\u03BCg
in Python for micro) by 1000 so that everything is in mg
so I can plot them on a bar graph in Plotly Dash.
But I'm stuck at extracting numbers.
Previously when I was making csv files after downloading the data, this worked but it now does not:
# extract numbers
new_df_arr = []
for _,df in df_dict.items():
df = df.astype(str)
df_copy = df.copy()
for i in range(1, len(df.columns)):
df_copy[df.columns[i]]=df_copy[df.columns[i]].str.extract('(\d+[.]?\d*)', expand=False) #replace(r'[^0-9]+','')
new_df_arr.append(df_copy)
# check df's
for df in new_df_arr:
print(df)