15

I have a dataset of molecules represented with SMILES strings. I was trying to represent this as graphs. Is there a way to do so? For instance, let's say I have string CC(C)(C)c1ccc2occ(CC(=O)Nc3ccccc3F)c2c1, is there a general way to convert this to a graph representation, meaning adjacency matrix and atom vector? I see questions addressing SMILES from graphs and I know rdkit has MolFromSmiles, but I can't find something to get graph from SMILES string.

Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
Blade
  • 984
  • 3
  • 12
  • 34
  • https://chemistry.stackexchange.com/questions/43299/is-there-a-way-to-use-free-software-to-convert-smiles-strings-to-structures ? – Davide Fiocco Jul 16 '19 at 17:42
  • It looks like this only gives an image of the molecule. I came across Open Babel earlier, but searching the word graph gives nothing at its wiki page. – Blade Jul 16 '19 at 17:58

4 Answers4

18

You could try pysmiles. Starting from the SMILES description you should be able to create a NetworkX graph and generate the desired objects with code along the lines of

from pysmiles import read_smiles
import networkx as nx
    
smiles = 'C12=C3C4=C5C6=C1C7=C8C9=C1C%10=C%11C(=C29)C3=C2C3=C4C4=C5C5=C9C6=C7C6=C7C8=C1C1=C8C%10=C%10C%11=C2C2=C3C3=C4C4=C5C5=C%11C%12=C(C6=C95)C7=C1C1=C%12C5=C%11C4=C3C3=C5C(=C81)C%10=C23'
mol = read_smiles(smiles)
    
# atom vector (C only)
print(mol.nodes(data='element'))
# adjacency matrix
print(nx.to_numpy_matrix(mol))

If you can accept a so-so visualization, you can also tentatively plot the molecule with

import matplotlib.pyplot as plt
elements = nx.get_node_attributes(mol, name = "element")
nx.draw(mol, with_labels=True, labels = elements, pos=nx.spring_layout(mol))
plt.gca().set_aspect('equal')

Fullerenes are fun to plot :)

FullereneNetworkX

Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
  • 1
    Thank you! It seems that the output for this adjacency matrix does not defer between single/double/triple bonds. Is there a way to enforce that? – Blade Jul 16 '19 at 22:14
  • Ah, see your answer https://stackoverflow.com/a/57066525/4240413 :) – Davide Fiocco Dec 16 '22 at 17:49
7

To complete Davide's answer https://stackoverflow.com/a/57063988/4240413, you can include the bond order to adjacency matrix using:

nx.to_numpy_matrix(mol, weight='order')

or according to networkx documentation using

nx.adjacency_matrix(mol, weight='order').todense()
Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
Blade
  • 984
  • 3
  • 12
  • 34
2

You can use rdkit's Chem.GetAdjacencyMatrix():

from rdkit import Chem
import networkx as nx

smiles = 'CC(C)(C)c1ccc2occ(CC(=O)Nc3ccccc3F)c2c1'
mol = Chem.MolFromSmiles(smiles)

# Get adjacency matrix
adjacency_matrix = Chem.GetAdjacencyMatrix(mol, useBO = True)

# Convert adjacency matrix to NetworkX graph
G = nx.from_numpy_array(adjacency_matrix)
Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
0

networkx is a nice solution. If you are looking for something more custom, you can create the graph yourself. See example going in the other direction (SMILES from graph) in this post: SMILES from graph

JoshuaBox
  • 735
  • 1
  • 4
  • 16
  • 1
    The OP actually wants the opposite of what you are suggesting. The OP wants SMILES **to** graph. He/she already has SMILES. – Tshilidzi Mudau Jan 27 '21 at 06:50
  • Yes true, have clarified that my answer is in the other direction. The link in the post mentioned leads you to SMILES -> graph though so imo the downvote is harsh (https://github.com/dakoner/keras-molecules/blob/dbbb790e74e406faa70b13e8be8104d9e938eba2/convert_rdkit_to_networkx.py#L65-L67) – JoshuaBox Apr 01 '21 at 12:18