1

My idea would be to create a VAE or a GAN capable of generating new drugs, using graphs as representations for my molecules. Now I’m asking the real question:

I started the project with a simple Pandas dataframe made up of SMILES strings and various features, like this one:

  • CC(=O)Nc1ccc(O)cc1, weight = 151.16, …

  • CC(=O)Oc1ccccc1C(=O)O, weight = 180, …

Is it possible to convert the strings in a graph data format? If yes, may you give me some suggestions on how to do that?

Thank you all!

  • This should help: https://stackoverflow.com/a/57063988/4551984 – quest Dec 23 '21 at 08:09
  • If the above comment doesn't help, you can also try at [Matter Modeling Stack Exchange](https://mattermodeling.stackexchange.com/) where there's an entire tag for SMILES (and also tags for machine-learning, and other related topics). – Nike Dec 23 '21 at 21:47
  • 1
    I agree with user1271772. You will find some prior questions about SMILES on StackOverflow, but you may have better luck with Matter Modeling or Chemistry for this topic, since they are more geared towards chemistry/molecular questions. – Tyberius Dec 23 '21 at 22:16
  • Check out https://github.com/JamesBremner/chemgraph for C++ code to visualize SMILES as a graph. – ravenspoint Sep 05 '22 at 19:24

2 Answers2

1

Yes, use dgl lifesci they have a few functions for smiles to graphs depending on the graph you want:

https://github.com/awslabs/dgl-lifesci/blob/master/python/dgllife/utils/mol_to_graph.py

Also deepchem has similar functionality in their inbuilt featurizers: https://github.com/deepchem/deepchem/blob/master/deepchem/feat/molecule_featurizers/mol_graph_conv_featurizer.py

Sometimes going stright from smiles to graph can be confusing, where you see anything that talks about mol e.g mol_to_graph, you can convert smiles to mol with the mol_from_smiles function in rdkit.Chem:

mol = Chem.MolFromSmiles('Cc1ccccc1')
mrw
  • 142
  • 5
0

Complementing the answer from @mrw, deepchem should be able to do exactly what you ask i.e. "Is it possible to convert the strings in a graph data format?"

Check out the tutorial in the documentation on feature generation generally in deepchem: https://deepchem.readthedocs.io/en/latest/get_started/tutorials.html#feature-engineering

and molecular graph convolution specifically: https://deepchem.readthedocs.io/en/latest/api_reference/featurizers.html#molgraphconvfeaturizer

import deepchem as dc

featurizer = dc.feat.MolGraphConvFeaturizer()
feats = featurizer.featurize(df.smiles) ## the smiles column from your dataframe

dataset = dc.data.DiskDataset.from_numpy(feats, 
                                         df.Y) ## the target/outcome column from your dataframe
dvik
  • 1