6

Is there a way to convert SMILES to either chemical name or IUPAC name using RDKit or other python modules?

I couldn't find something very helpful in other posts.

Thank you very much!

Alex
  • 99
  • 1
  • 7

2 Answers2

5

As far as I am aware this is not possible using rdkit, and I do not know of any python modules with this ability. If you are ok with using a web service you could use the NCI resolver.

Here is a naive implementation of a function to retrieve an IUPAC identifier from a SMILES string:

import requests


CACTUS = "https://cactus.nci.nih.gov/chemical/structure/{0}/{1}"


def smiles_to_iupac(smiles):
    rep = "iupac_name"
    url = CACTUS.format(smiles, rep)
    response = requests.get(url)
    response.raise_for_status()
    return response.text


print(smiles_to_iupac('c1ccccc1'))
print(smiles_to_iupac('CC(=O)OC1=CC=CC=C1C(=O)O'))

[Out]:
BENZENE
2-acetyloxybenzoic acid

You could easily extend it to convert multiple different formats, although the function isn't exactly fast...

Another solution is to use PubChem. You can use the API with the python package pubchempy. Bear in mind this may return multiple compounds.

import pubchempy


# Use the SMILES you provided
smiles = 'O=C(NCc1ccc(C(F)(F)F)cc1)[C@@H]1Cc2[nH]cnc2CN1Cc1ccc([N+](=O)[O-])cc1'
compounds = pubchempy.get_compounds(smiles, namespace='smiles')
match = compounds[0]
print(match.iupac_name)

[Out]:
(6S)-5-[(4-nitrophenyl)methyl]-N-[[4-(trifluoromethyl)phenyl]methyl]-3,4,6,7-tetrahydroimidazo[4,5-c]pyridine-6-carboxamide
Oliver Scott
  • 1,673
  • 8
  • 17
  • It seems to work but the smiles that I have may be a bit on the complicated side, it gives me this error '404 Client Error: NOT FOUND for url'. An example of a smile I'm working with: O=C(NCc1ccc(C(F)(F)F)cc1)[C@@H]1Cc2[nH]cnc2CN1Cc1ccc([N+](=O)[O-])cc1 – Alex Oct 13 '20 at 16:28
  • Thanks a lot anyway! The web service you provided is a very useful tool. – Alex Oct 13 '20 at 16:29
  • Ok, a follow-up question. I found a website that is more like a shop, but it has a search function using smiles. It seems to find most of my smiles and it also gives the iupac name like I want it to. Is there a way I could implement it in my code like you did with the web service you mentioned? This is the website: https://www.molport.com/shop/find-chemicals-by-smiles – Alex Oct 13 '20 at 16:45
  • Doesn't look like it would be easy to do. It is much easier when the website provides an API you can work with. See the other solution I have proposed. I hope it helps – Oliver Scott Oct 13 '20 at 18:26
  • yes, it worked great. A few of my smiles were left unnamed but that is a very small inconvenience. Thanks a lot for your help! – Alex Oct 14 '20 at 05:18
  • 1
    Great question and answer Oliver and @Alex, but you might be interested in checking out https://mattermodeling.stackexchange.com/ for these types of questions – Cody Aldaz Jan 06 '21 at 04:46
1

Recently I managed this conversion using pubchempy. Here is the code for trying.


filename = open("inif.txt", "r")

for line in filename :
    event = line
    compounds = pcp.get_compounds(event, namespace='smiles') 
    match = compounds[0]
    print(i,'$$$','the CID is',compounds,'$$$','The IUPAC name is',match.iupac_name,'$$$','for the SMILE',event)
    i+=1```
Rag
  • 19
  • 2