0

I have used BeautifulSoup to collect a 2D data array from a website as a string. I believe that the table format is related to the json format, however, when I tried to apply pandas.read_json() on the string it gives a value error. I tried converting the "nul" to "0" and remove the "\n" from the string to no avail.

data_str = """[[{label:'column 1',type:'number'},{label:'column2',type:'number'},{label:'column 3',type:'number'}],
[205, null,  89748],
[206, null,  66813],
[235,   75,   null],
[236,  138,   null]]"""

I can convert the string to a pandas DataFrame by splitting the first row of the table containing the column names from the data entries, but this seems rather clumsy (see below).

import numpy as np
import pandas as pd
import ast

col_names, data_str = data_str.split('\n',1)
col_names = re.findall(r'label:\'(.*?)\'', col_names)
data_str = data_str.replace('\n','')
data_str = data_str.replace('null','0.')

data_arr = np.array(ast.literal_eval('[' + data_str))
data_df = pd.DataFrame(data_arr, columns = col_names)

Is there a more pythonic way to convert the string to a pandas DataFrame?

Paul
  • 211
  • 1
  • 8
  • 1
    This is an aside, and not directly related to the question as asked, but I would think that there was likely a better way to pull this information from whatever site into a dataframe. – Henry Ecker Jun 25 '21 at 15:40
  • @HenryEcker thanks for your comment, this is the first time I try something like this, so you are probably right. Could you be so kind to provide a hint how to proceed? [This](https://physics.nist.gov/cgi-bin/ASD/lines1.pl?composition=Cr%3A100&mytext%5B%5D=Cr&myperc%5B%5D=100&spectra=Cr0-2&low_w=200&limits_type=0&upp_w=600&show_av=2&unit=1&resolution=1000&temp=1&eden=1e17&maxcharge=2&min_rel_int=0.01&libs=1) is the cite I try to get the data from. – Paul Jun 25 '21 at 15:47

1 Answers1

1

No, It's not a valid JSON but a javascript object as raw string. You need install another module like demjson. see the answer here for more detais.

SCKU
  • 783
  • 9
  • 14