How can I only get a numerical answer when applying a function to a dataframe?

Question

This is the code I have until now:

import pandas as pd
import pubchempy
import numpy as np

df = pd.read_csv("Data.tsv.txt", sep="\t")

from pubchempy import get_properties

df['CID'] = df['CID'].astype(str).apply(lambda x: x.replace('.0',''))
df['CID'] = df['CID'].astype(str).apply(lambda x: x.replace('0',''))

df = df.drop(df[df.CID=='nan'].index)

df = df.drop( df.index.to_list()[150:] ,axis = 0 )


df['CID']= df['CID'].map(lambda x: get_properties(identifier=x, properties='MolecularWeight') if float(x) > 0 else pd.NA)

print(df)

The output that I'm getting under the 'CID' column is this:

CID

[{'CID': 5339, 'MolecularWeight': '398.4'}]

What can I do so that I only get the numerical 'MolecularWeight' value in the 'CID' column (eg. 398.4 in column one etc)?

```df['CID']= df['CID'].map(lambda x: float(get_properties(identifier=x, properties='MolecularWeight')['MolecularWeight']) if float(x) > 0 else pd.NA) ``` — mujjiga, May 19 '22 at 21:54
I am a bit confused. You don't need to `lambda` to `replace`, and you can cascade them like `str.replace('.0','').replace(('0','')`. BTW, if there is any zero anywhere in your CID it will just remove it. Also, you are comparing to `"nan"` when it is much easier&faster to compare it to `nan`. **Is [{'CID': 5339, 'MolecularWeight': '398.4'}] one of the entries of `df["CID"]` after read_csv?** — Zaero Divide, May 19 '22 at 22:26
No [{'CID': 5339, 'MolecularWeight': '398.4'}] is one of the entries after df['CID']= df['CID'].map(lambda x: get_properties(identifier=x, properties='MolecularWeight') if float(x) > 0 else pd.NA). I used lambda because in order to use str.replace('.0','').replace(('0','') I would have to provide a known string value. What my code does is that it uses the pubchempy.get_properties to search through pubchem and return the molecular weight of a compound given a specific identification value (the values in the 'CID' column). — New_to_coding, May 19 '22 at 22:49
`df['CID'] = df['CID'].astype(str).apply(lambda x: x.replace('.0',''))` + `df['CID'] = df['CID'].astype(str).apply(lambda x: x.replace('0',''))` is actually the same as doing `df['CID'] = df['CID'].astype(str).str.replace('(.)?0',"")`. BTW, if you don't @ us, we don't see that you replied — Zaero Divide, May 19 '22 at 22:57
@ZaeroDivide Oh damn lol, didn't even know you could @ someone, thank you. Wait so are you saying I could combing the pubchempy.get_properties function with the df['CID'] = df['CID'].astype(str).str.replace('(.)?0',"") function? — New_to_coding, May 19 '22 at 23:33
@mujjiga I tried to add an image of the output but my reputation isn't high enough unfortunately... But the column output is the same as what I posted. The code you worked also didn't work, the error says, "TypeError: list indices must be integers or slices, not str". – — New_to_coding, May 20 '22 at 02:40
@New_to_cofing Give full dataframe or at least a part of it. Use df_to_dict and copy/paste here, like this we can reproduce the error in our IDE. — Drakax, May 27 '22 at 22:02

How can I only get a numerical answer when applying a function to a dataframe?

0 Answers0