Function to find top list of items for a given list in a JSON input

Question

I have a DataFrame like this:

| json_col                                           |
| ---------------------------------------------------|
| {"category":"a","items":["a","b","c","d","e","f"]} |
| {"category":"b","items":["u","v","w","x","y"]}     |
| {"category":"c","items":["p","q"]}                 |
| {"category":"d","items":["m"]}                     |

I converted it to strings of dicts:

x = pd.Series(', '.join(df_list['json_col'].to_list()), name='text')

The resultant is like below:

'{"category":"a","items":["a","b","c","d","e","f"]},
{"category":"b","items":["u","v","w","x","y"]},
{"category":"c","items":["p","q"]},
{"category":"d","items":["m"]}'

(EDIT: This was my original input when I posted the question but I have been pointed that it is not a right way to use JSON so I am providing the dataframe above.)

I am required to write a python function that takes an item as an input and return the top 3 items from the list where it belongs to (excluding itself). Items are in sequence of priority so top 3 is top first items.

def item_list(above_json_input, item = "a"):
    return list

For example the result list should follow the following rules:

If the item is "a" then iterate through category - a where item a is present and return top 3 items in the sequence - ["b","c","d"]
If the item is "w" then then iterate through category - b where item w is there and return - ["u","v","x"]
If the item is "q" then look in category - c where item q is there and return - ["p"] because there are less than 3 top items other than q
If the item is "m" then the returned list should look in category d where item q is there and return empty [] because there are no other items in that list to look for top items.

Same goes with an item which doesn't exist like item = "r" which is not there in any category. We can throw an error or return an empty list again.

I am not sure how to read the json and get the list of top items. Is this even possible?

it is string of dicts with each dict have values made of lists that I need to search on — trojan horse, Jul 21 '22 at 00:29
Break up into two parts: dealing with JSON and then applying the rules. First thing is starting off with valid JSON -- if it's supposed to be a list of categories, it's missing surrounding `[`/`]`, then use the `json` package in stdlib to parse the string. For the second part, please ask a more specific question. What have you tried, and what specific error are you blocked on? — Kache, Jul 21 '22 at 00:48
Actually the JSONs I shared are records in each row of the dataframe. Like a column named - "json" with 4 rows with each row one category and items I shared. Can we use it with the dataframe directly? — trojan horse, Jul 21 '22 at 01:08

Jonathan Ciapetti · Accepted Answer · 2022-07-21T01:01:47.450

1

I fixed your JSON, as it was badly formatted. For input "c", ['a', 'b', 'd'] and ['p', 'q'] are printed:

import json

data_string = """{
        "data" : [
                {"category":"a","items":["a","b","c","d","e","f"]},
                {"category":"b","items":["u","v","w","x","y"]},
                {"category":"c","items":["p","q"]},
                {"category":"d","items":["m"]}
        ]
}"""

data = json.loads(data_string)["data"]

user_input = input("Pick a letter: ")

found = False
for values in data:
        if user_input in (values["category"], *values["items"]):
                found = True
                temp = [item for item in values["items"] if item != user_input]
                print(temp[:3])

if not found:
        print([])

edited Jul 21 '22 at 01:01

answered Jul 21 '22 at 00:49

Jonathan Ciapetti

1,261
3
11
16

But the JSON is similar to what I shared but that is because I created it out of a dataframe by combining all rows. Can it be done on a dataframe where each row is a json like row 1 = {"category":"a","items":["a","b","c","d","e","f"]}, row 2 = {"category":"b","items":["u","v","w","x","y"]} ....and I want to read that column and then check for those records and return the list. Did I make mistake by making all the rows as a single string of jsons. – trojan horse Jul 21 '22 at 01:12
I shared him above the original input as a Dataframe too – trojan horse Jul 21 '22 at 01:15
can you check the new inputs – trojan horse Jul 21 '22 at 01:20
Well, I read "JSON input" in the question so I treated like a JSON, and [here](https://json.org/example.html) you can see examples of JSONs. If you get your data from a Pandas DataFrame, sure you can still get the same results, but it would be a different question, I'm not being harsh, I just think that it would be not ok with the rules of SO, and others can answer that better than me. Btw it would be just a matter of using the DataFrame data instead of the JSON one. – Jonathan Ciapetti Jul 21 '22 at 01:29
@trojanhorse Also, when you edit your question and people have already given some answers, I think it would be better to explicitly write that you edited, or to the mods those answers (like mine) will seem odd. – Jonathan Ciapetti Jul 21 '22 at 01:35
Apologies, I didnt mean to confuse. When I was trying it as a JSON based on some other SO answer and got stuck and then you suggested the JSON format was wrong. Let me make it clear that I edited the question and accept your answer because it works. I just need it the same for dataframe. Can you suggest how do I read the JSON rows get the data part. I am getting error for that – trojan horse Jul 21 '22 at 02:41
Thank you, it's ok, no big deal. I read [here](https://stackoverflow.com/questions/20037430/reading-multiple-json-records-into-a-pandas-dataframe) that you can do it this way: 1) delete the ',' at the end of each row in `data_string`, 2) use `data = pd.read_json(data_string, lines=True)` . From that point forward, it's just you and the DataFrame, see how other answers implement the algorithm with the DataFrame as input. – Jonathan Ciapetti Jul 21 '22 at 03:01

omar · Answer 2 · 2022-07-21T15:46:30.100

You could try this on your dataframe:

import pandas as pd

df = pd.DataFrame({'jsonCol':[{"g":[]}]})
h = df['jsonCol']


def search(inm):
    for item in h:
        if inm in item['items']:
            if len(item['items'])>3:
                item['items'].pop(item['items'].index(inm))
                return item['items'][:3]
            if len(item['items'])<3:
                item['items'].pop(item['items'].index(inm))
                return item['items']
    return []
        
print(search('r'))

edit:

h = [{"category":"a","items":["a","b","c","d","e","f"]},{"category":"b","items":["u","v","w","x","y"]},{"category":"c","items":["p","q"]},{"category":"d","items":["m"]}]

def search(inm):
    for item in h:
        if inm in item['items']:
            if len(item['items'])>3:
                item['items'].pop(item['items'].index(inm))
                return item['items'][:3]
            if len(item['items'])<3:
                item['items'].pop(item['items'].index(inm))
                return item['items']
    return []
        
print(search('b'))  # answer ['a', 'c', 'd']

this is not giving the expected result. If I pass "b" the answer should be a,c,d and not c,d,e — trojan horse, Jul 21 '22 at 03:11

Function to find top list of items for a given list in a JSON input

2 Answers2