1

My team is migrating from Clickhouse to Azure Data Explorer (ADX). We are currently experiencing difficulties to query our data from ADX: the queried values are correct, but the data are read as a string rather than as an array of floats.

Here is an example string:

mydummystring='[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]'

In order to convert this string to a numpy array, I found this workaround based on list comprehension (inspired by this SO post):

import numpy as np
mynumpyarray = np.array([np.array(x) for x in eval('['+mydummystring.replace('][', '],[')+']')])

Is there a better (safer?) way to achieve this conversion? I know that it would be better to read the data correctly in the first place, but for now I am looking for a robust way to convert the output string to actual numbers.

Sheldon
  • 4,084
  • 3
  • 20
  • 41

3 Answers3

1

You can convert '[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]' to '[[1.0,2.0,3.0],[4.0,5.0,6.0],[6.0,7.0,8.0]]' with str.replace the use ast.literal_eval.

import ast
mydummystring = '[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]'
mydummystring = '[' + mydummystring.replace('][', '],[') + ']'
mydummystring = ast.literal_eval(mydummystring)
arr = np.array(mydummystring)
print(arr)

Or use json.loads:

import json
mydummystring = '[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]'
mydummystring = '[' + mydummystring.replace('][', '],[') + ']'
mydummystring = json.loads(mydummystring)
arr = np.array(mydummystring)
print(arr)

array([[1., 2., 3.],
       [4., 5., 6.],
       [6., 7., 8.]])
I'mahdi
  • 23,382
  • 5
  • 22
  • 30
  • Thanks for your reply! I would have never thought of using json's `loads` method to evaluate the input string. – Sheldon Mar 19 '23 at 20:21
1

You can use ast.literal_eval, which only parses Python literal structures and does not run arbitrary code.

from ast import literal_eval
s = '[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]'
np_arr = np.array([np.array(x) for x in literal_eval('['+s.replace('][', '],[')+']')])

Note that a list comprehension is not necessary to create the NumPy array.

np.array(literal_eval('['+s.replace('][', '],[')+']'))
Unmitigated
  • 76,500
  • 11
  • 62
  • 80
  • 1
    Thanks for your reply @Unmitigated. I will review the difference between `eval` and ast's `literal_eval`. Anyways, I imagine that skipping the list comprehension will make the conversion more efficient. – Sheldon Mar 19 '23 at 20:28
1

Without any extra library but with string prepeocessing. np.fromstring returns a 1 dimensional array, so find the shape, format it and then reshape.

s = '[1.0,2.0,3.0][4.0,5.0,6.0][6.0,7.0,8.0]'
shape = s.count('['), s.count('][')+1
# flat array format
s = s.strip('][').replace('][', ',')
a = np.fromstring(s, sep=',', dtype=float).reshape(shape)
cards
  • 3,936
  • 1
  • 7
  • 25
  • 1
    Nice tip, @cards. I may end up counting the number of `']['` to ensure that I am not missing any data. – Sheldon Mar 19 '23 at 20:30