0

I have a Dataframe with columns that look like this:

df=pd.DataFrame()
df['symbol'] = ['A','B','C']
df['json_list'] = ['[{name:S&P500, perc:25, ticker:SPY, weight:1}]',
          '[{name:S&P500, perc:25, ticker:SPY, weight:0.5}, {name:NASDAQ, perc:26, ticker:NASDAQ, weight:0.5}]',
          '[{name:S&P500, perc:25, ticker:SPY, weight:1}]']
df['date'] = ['2022-01-01', '2022-01-02', '2022-01-02']
df:
    symbol  json_list                                         date
0   A       [{name:S&P500, perc:25, ticker:SPY, weight:1}]    2022-01-01
1   B       [{name:S&P500, perc:25, ticker:SPY, weight:0.5... 2022-01-02
2   C       [{name:S&P500, perc:25, ticker:SPY, weight:1}]    2022-01-02

The values in the json_list column are of <class 'str'>.

How can I convert the json_list column items to dicts so I can access them based on key:value pairs?

Thank you in advance.

MathMan 99
  • 665
  • 1
  • 7
  • 19
  • 1
    This is very similar to https://stackoverflow.com/questions/20680272/parsing-a-json-string-which-was-loaded-from-a-csv-using-pandas e.g., apply json.loads. – Grantus May 20 '22 at 00:06
  • 2
    Except those aren't JSON strings. You could do some string parsing and then use `ast.literal_eval`, but it'd be quite ugly. – BrokenBenchmark May 20 '22 at 00:07
  • @BrokenBenchmark, you're right, these aren't jsons. I got this error when I applied `json.loads`: `JSONDecodeError: Expecting property name enclosed in double quotes:` – MathMan 99 May 20 '22 at 00:12
  • name should be 'name' if json – BENY May 20 '22 at 00:38
  • @BENY Please note these are string items that I want to convert to dicts. I don't know if that is possible. – MathMan 99 May 20 '22 at 00:41
  • Maybe this is easier to solve https://stackoverflow.com/questions/72307099/python-filter-dataframe-into-two-separate-dataframes/72308202#72308202 <- The original problem – MathMan 99 May 20 '22 at 00:42

2 Answers2

1

UPDATED to reflect the fact that the json strings in the question are not singleton lists, but can contain multiple dict-like elements.

This will put a list of dict object in a new column of your dataframe:

def foo(x):
    src = x['json_list']
    rawList = src[1:-1].split('{')[1:]
    rawDictList = [x.split('}')[0] for x in rawList]
    dictList = [dict(x.strip().split(':') for x in y.split(',')) for y in rawDictList]
    for dct in dictList:
        for k in dct:
            try:
                dct[k] = int(dct[k])
            except ValueError:
                try:
                    dct[k] = float(dct[k])
                except ValueError:
                    pass
    return dictList
df['list_of_dict_object'] = df.apply(foo, axis = 1)

Original answer:

This will put a dict in a new column of your dataframe that should give you something close to what you want, except for numeric typing:

df['dict_object'] = df.apply(lambda x: dict(x.strip().split(':') for x in x['json_list'][2:-2].split(',')), axis = 1)

To get float or int where string values are convertible, you can do this:

def foo(x):
    d = dict(x.strip().split(':') for x in x['json_list'][2:-2].split(','))
    for k in d:
        try:
            d[k] = int(d[k])
        except ValueError:
            try:
                d[k] = float(d[k])
            except ValueError:
                pass
    return d
df['dict_object'] = df.apply(foo, axis = 1)
constantstranger
  • 9,176
  • 2
  • 5
  • 19
0

The "json" is almost valid yaml. If you add a space after the colons, you can parse it using pyyaml.

df.json_list.apply(lambda data: yaml.safe_load(data.replace(':', ': ')))
Håken Lid
  • 22,318
  • 9
  • 52
  • 67