0

I am working on an assignment of bioinfo.

I have converted from SMILES to the fingerprints(fp) for a group of molecules, and the data frame looks like this: enter image description here

The type of fingerprint is:
enter image description here

Then save the file to CSV file with to_csv.

Then reading the file with pd.read_csv, the fingerprint changes into a string and looks like:enter image description here

I replace the \n with empty space, enter image description here, but the type is still a string.

I have used various methods:

  1. ast.literal_eval(fp_df['fp']), then there is error: ValueError: malformed node or string:, from the most upvoted answer here
  2. list(fp_df['fp']) can't change the type of string
  3. [n.strip() for n in x] doesn't work: List item

And other methods have also been tried.

May I ask for your help on how to deal with it? Thanks in advance.

Annie
  • 85
  • 3
  • 14
  • [Please do not upload images of code/data/errors when asking a question.](//meta.stackoverflow.com/q/285551) – matszwecja May 18 '22 at 12:39
  • I don't quite understand. Let's say you have a string `a = "[0 0 0 1 0]"`, you want to convert it to a list `a = [0, 0, 0, 1, 0]`, is that right? – ImranD May 18 '22 at 12:39
  • The origin is array [0 0 0 1 0], but with save to_csv, read_csv, the array becames a string with an array'[0 0 0 1 0]', and I want to convert it into array [0 0 0 1 0]. – Annie May 18 '22 at 12:47
  • 1
    `ast.literal_eval` doesn't work because it expects the items to be comma-separated. `[n.strip() for n in x]` doesn't work because it iterates over the string character-by-character, so your result will include the `"["` and `"]"` chars and the space chars will be stripped to `""` (empty string) but still added to the result – Anentropic May 18 '22 at 12:51
  • @matszwecja: thank you for the reminder. Will pay attention to it next time. – Annie May 18 '22 at 13:10
  • @Anentropic, I have thought that it just converts the array into a string with an array, but in fact, it converts each element in the array into a string, split with space. Am I right? – Annie May 18 '22 at 13:18
  • @Annie the problem is you don't have an array to start with, you have a string. So we need to modify the content of the string until it looks like a string we can successfully convert into an array (that contains only what we want it to) – Anentropic May 18 '22 at 13:24

1 Answers1

2

If your value looks like:

"[1 0 0 1 1 0]"

Then you can turn it into a list by:

values = value.lstrip("[").rstrip("]").split()

But the values will still be numeric strings, so you could cast to int by:

values = [int(n) for n in value.lstrip("[").rstrip("]").split()]
Anentropic
  • 32,188
  • 12
  • 99
  • 147
  • 2
    Another way of doing the same thing: `values = list(map(int, value[1:-1].split()))` – ImranD May 18 '22 at 12:45
  • @ImranD indeed maybe the slice `value[1:-1]` is nicer, more concise – Anentropic May 18 '22 at 12:53
  • Thank you for your great help! I am thinking whether it is possible to have another way to save the file to avoid the further data process: to save an array as an array in CSV? Because each fp is a vector 2048x1, the conversion takes time. – Annie May 18 '22 at 13:04
  • 1
    @Annie easiest is probably to use JSON, so when writing the value to a column in the csv file you can use `json.dumps(value)` (and `import json` at top of your python script) – Anentropic May 18 '22 at 13:06
  • 1
    if you do that you can then use `json.loads(value)` when reading the column, instead of needing this strip/split/map – Anentropic May 18 '22 at 13:08
  • @ImranD, it is concise. May I ask how these parameters[1:-1] come? It seems [1:-1],[1:-2],[1:-3] all work, [1:0][2:0] etc. don't work. When googling, most of them explain input(), but don't find the explanation for these parameters. – Annie May 18 '22 at 13:39
  • see https://www.pythontutorial.net/python-basics/python-list-slice/ – Anentropic May 18 '22 at 13:59