1

I have a used a few different methods to get my list of dictionaries to a single dictionary and they all work without throwing errors, but the problem is that no matter the method, my values go missing!

I have a json:

[{"Query Fields Used":null,"History Connection Name":null,"History Query Run Count":"138,036"},
{"Query Fields Used":"[\"shifts.date_date\", \"hifts.sum_booked_shifts\"]","History Connection Name":"sandy","History Query Run Count":"59,069"},
{"Query Fields Used":"[\"daily_supply_v2.sum_booked_shifts\", \"daily_supply_v2.date_date\"]","History Connection Name":"low","History Query Run Count":"50,259"},
{"Query Fields Used":"[\"daily_supply_v2.date_date\", \"daily_supply_v2.sum_booked_shifts\"]","History Connection Name":"sandy","History Query Run Count":"50,235"},
{"Query Fields Used":"[\"orders.count_loads\", \"orders.complete_timestamp_date\"]","History Connection Name":"sandy","History Query Run Count":"29,658"},
{"Query Fields Used":"[\"hitch_stack.date_date\", \"hitch_stack.hitch_count\"]","History Connection Name":"low","History Query Run Count":"24,928"}]

I imported this with pandas.read_json. The Query Field's came in as a single string, so I removed all special characters with regex sub and then split by comma to get it to a list.

xx =[]
for i in looker['Query Fields Used']:
    xx.append(i)
 
yy = []
for i in xx:
    if i==None:
        yy.append("None")
    else:
        x = re.sub(r'[^a-zA-Z0-9_.,]','', i)
        yy.append(x)
    
y = []
for i in yy:
    if i==None:
        y.append("None")
    else:
        y.append(i.split(","))
        
looker['Query Fields Used'] = y

then I ran

d = [{key: val} for key, val in zip(looker['History Connection Name'], looker['Query Fields Used'])]

Which gave my the right keys, but the values stopped populating. I also tried creating a list of dictionaries and imported funcy with a join and it gave the exact same results.

I am looking for a dictionary with values that contain all of the values from the corresponding lists, but only getting a few. What am I doing wrong here?

Talha Tayyab
  • 8,111
  • 25
  • 27
  • 44
Basalty
  • 55
  • 8
  • What is `looker`? Please provide a [mre] that we can run and reproduce your problem – Pranav Hosangadi Apr 29 '22 at 15:20
  • Looker is just what I named my pandas DataFrame – Basalty Apr 29 '22 at 15:20
  • 1
    You shouldn't have had to remove special characters or split. Use `json.loads(json_string)` to convert JSON to a Python object. If it's valid JSON, it'll be converted to a valid Python object. You shouldn't have to do any of this work. – ddejohn Apr 29 '22 at 15:20
  • Does this answer your question? [Parsing a JSON string which was loaded from a CSV using Pandas](https://stackoverflow.com/questions/20680272/parsing-a-json-string-which-was-loaded-from-a-csv-using-pandas) Of course, you're going to have to handle `None` values, so I recommend `looker["Query Fields Used"].apply(lambda x: json.loads(x) if x else None)` – Pranav Hosangadi Apr 29 '22 at 15:23
  • Can you edit the OP to include your raw JSON string? – ddejohn Apr 29 '22 at 15:28
  • That is directly copied from the json file – Basalty Apr 29 '22 at 15:31
  • @Pranav that did not give me a dictionary, just another series and no "History Connection Name" – Basalty Apr 29 '22 at 15:32
  • You can then add that series as a column to your dataframe or overwrite an existing column and create a dictionary, it should be pretty straightforward. – Pranav Hosangadi Apr 29 '22 at 15:35
  • Following that, it added a *few* more values but still not all of them. – Basalty Apr 29 '22 at 15:37
  • To be clear I did as you instructed and used `look = looker["Query Fields Used"].apply(lambda x: json.loads(x) if x else None)`, `looker['Query Fields Used'] = look`, `ddd = dict(zip(looker['History Connection Name'],looker['Query Fields Used']))` – Basalty Apr 29 '22 at 15:42

1 Answers1

0

It looks like you want History Connection Name to be the key in your dictionary and Query Fields Used to be the values? If so, the reason why your dictionary is "losing" records is because a Python dictionary's keys must be unique, which is not the case with History Connection Names:

{0: None, 1: 'sandy', 2: 'low', 3: 'sandy', 4: 'sandy', 5: 'low'}

You can make unique keys for your records (e.g., using the numeric indices) and remake the dictionary accordingly.

Or you can merge the values with their respective keys. Here are two ways to do merge the values corresponding to their keys:

from collections import defaultdict

import pandas as pd

# Given data
data_json = r"""[{"Query Fields Used": null, "History Connection Name": null, "History Query Run Count": "138,036"},
 {"Query Fields Used"      : "[\"shifts.date_date\", \"hifts.sum_booked_shifts\"]", "History Connection Name": "sandy",
  "History Query Run Count": "59,069"},
 {"Query Fields Used"      : "[\"daily_supply_v2.sum_booked_shifts\", \"daily_supply_v2.date_date\"]",
  "History Connection Name": "low", "History Query Run Count": "50,259"},
 {"Query Fields Used"      : "[\"daily_supply_v2.date_date\", \"daily_supply_v2.sum_booked_shifts\"]",
  "History Connection Name": "sandy", "History Query Run Count": "50,235"},
 {"Query Fields Used"      : "[\"orders.count_loads\", \"orders.complete_timestamp_date\"]",
  "History Connection Name": "sandy", "History Query Run Count": "29,658"},
 {"Query Fields Used"      : "[\"hitch_stack.date_date\", \"hitch_stack.hitch_count\"]",
  "History Connection Name": "low", "History Query Run Count": "24,928"}]"""

# 1 - Iteration and using pd.read_json, like in your example
data = pd.read_json(data_json, orient="columns").to_dict()
merged: dict = defaultdict(list)
for name, fields in zip(data["History Connection Name"].values(), data["Query Fields Used"].values()):
    merged[str(name)].append(fields)

# 2 - Keeping the records in a dataframe
data = pd.read_json(data_json, orient="columns")
args: dict = {'History Connection Name': 'first', 'Query Fields Used': ' '.join}
data = data.groupby("History Connection Name", as_index=False).agg(args)

# Outputs of the two merges respectively
{'None' : [None],
 'sandy': ['["shifts.date_date", "hifts.sum_booked_shifts"]',
           '["daily_supply_v2.date_date", "daily_supply_v2.sum_booked_shifts"]',
           '["orders.count_loads", "orders.complete_timestamp_date"]'],
 'low'  : ['["daily_supply_v2.sum_booked_shifts", "daily_supply_v2.date_date"]',
           '["hitch_stack.date_date", "hitch_stack.hitch_count"]']}

| History Connection Name   | Query Fields Used                                                                                                                                                           |
|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| low                       | ["daily_supply_v2.sum_booked_shifts", "daily_supply_v2.date_date"] ["hitch_stack.date_date", "hitch_stack.hitch_count"]                                                     |
| sandy                     | ["shifts.date_date", "hifts.sum_booked_shifts"] ["daily_supply_v2.date_date", "daily_supply_v2.sum_booked_shifts"] ["orders.count_loads", "orders.complete_timestamp_date"] |
Ray
  • 463
  • 4
  • 8