0

I have a JSON file with key value pair data. My JSON file looks like this.

{
    "professors": [
        {
            "first_name": "Richard", 
            "last_name": "Saykally", 
            "helpfullness": "3.3", 
            "url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=111119", 
            "reviews": [
                {
                    "attendance": "N/A", 
                    "class": "CHEM 1A", 
                    "textbook_use": "It's a must have", 
                    "review_text": "Tests were incredibly difficult (averages in the 40s) and lectures were essentially useless. I attended both lectures every day and still was unable to grasp most concepts on the midterms. Scope out a good GSI to get help and ride the curve."
                }, 
                {
                    "attendance": "N/A", 
                    "class": "CHEMISTRY1A", 
                    "textbook_use": "Essential to passing", 
                    "review_text": "Saykally really isn't as bad as everyone made him out to  be. If you go to his lectures he spends about half the time blowing things up, but if you actually read the texts before his lectures and pay attention to what he's writing/saying, you'd do okay. He posts practice tests that were representative of actual tests and curves the class nicely!"
                }]
         {
      {
        "first_name": "Laura", 
        "last_name": "Stoker", 
        "helpfullness": "4.1", 
        "url": "http://www.ratemyprofessors.com/ShowRatings.jsp?tid=536606", 
        "reviews": [
            {
                "attendance": "N/A", 
                "class": "PS3", 
                "textbook_use": "You need it sometimes", 
                "review_text": "Stoker is by far the best professor.  If you put in the effort, take good notes, and ask questions, you will be fine in the class. As far as her lecture, she does go a bit fast, but her lecture is in the form of an outline. As long as you take good notes, you will have everything you need for exams. She is funny and super nice if you speak with her"
            }, 
            {
                "attendance": "Mandatory", 
                "class": "164A", 
                "textbook_use": "Barely cracked it open", 
                "review_text": "AMAZING professor.  She has a good way of keeping lectures interesting.  Yes, she can be a little everywhere and really quick with her lecture, but the GSI's are useful to make sure you understand the material.  Oh, and did I mention she's hilarious!"
            }]
    }]

So I'm trying to do multiple things. I'm trying to get the most mentioned ['class'] key under reviews. Then get the class name and the times it was mentioned. Then I'd like to output my format in this manner. Also under professor array. It's just the info of professors for instance for CHEM 1A, CHEMISTRY1A - It's Richard Saykally.

{
    courses:[
    {
       "course_name" : # class name
       "course_mentioned_times" : # The amount of times the class was mentioned
       professors:[ #The professor array should have professor that teaches this class which is in my shown json file
         {
              'first_name' : 'professor name'
              'last_name' : 'professor last name'
         }
    }

So I'd like to sort my json file key-value where I have max to minimum. So far all I've been able to figure out isd

if __name__ == "__main__":
        open_json = open('result.json')
        load_as_json = json.load(open_json)['professors']
        outer_arr = []
        outer_dict = {}
        for items in load_as_json:

            output_dictionary = {}
            all_classes = items['reviews']
            for classes in all_classes:
                arr_info = []
                output_dictionary['class'] = classes['class']
                output_dictionary['first_name'] = items['first_name']
                output_dictionary['last_name'] = items['last_name']
                #output_dictionary['department'] = items['department']
                output_dictionary['reviews'] = classes['review_text']
                with open('output_info.json','wb') as outfile:
                    json.dump(output_dictionary,outfile,indent=4)
Benji
  • 137
  • 3
  • 15
  • Possible duplicate of http://stackoverflow.com/questions/18871217/how-to-custom-sort-a-list-of-dict-to-use-in-json-dumps – Alex Hall Apr 21 '16 at 22:38
  • 1
    Your question title mentions formatting, but it sounds like it's about ordering the data in json files. Is that correct? You also need to be clearer (more explicit) about what your inputs and desired outputs are. – martineau Apr 21 '16 at 22:42
  • Benji, Stack Overflow is a question-and-answer site. Readers such as yourself ask questions and other readers attempt to answer them. Your post has a lot of information in it, but it is missing the one thing that makes Stack Overflow work: a question. Do you have a specific programming question? – Robᵩ Apr 21 '16 at 23:03
  • I apologize about that @Rob. Yes, I am having issues printing the output in the format I desire. I don't know how I should approach. As in when I initiate a new dictionary or new array. My output keeps duplicating. I'm having issues with initiating the dictionary and array in my script. Like which for loop it needs to needs to be under. Hence I provided my following code. – Benji Apr 22 '16 at 02:00
  • What is your question? – Robᵩ Apr 22 '16 at 05:03
  • @Robᵩ Hey the answer you wrote below with that script. That's exactly how I wanted. Now my question is how do I just have one professor name instead of multiple names. Like the professor "richard" shows multiple times in my professor array instead I'd like to get rid if duplicates and just print their names one time only under professor array. – Benji Apr 22 '16 at 07:05
  • @Robᵩ Here are the things that I don't know how to do in python. And I'd like to know how to do so. So I have multiple classes now. Such as CHEM 1, CHEM 123 and they are repeated multiple times throughout my outputfile. So what I'd like to do is. Print the most repeated chem class only and remove other from my key-value. Just print the highest value of each class such as if we have MATH 1234, MATH 532 also CHEM 1234, CHEM 532 - So I just want one MATH class and ONE CHEM class, we're picking each depending on which of them has the most repeated value. – Benji Apr 22 '16 at 08:12
  • @Robᵩ Wish I could wrap everything in one comment - Anyhow. Second thing is the script you gave me is what exactly I wanted. However under professor array we have multiple professors with same name. So for my output I want to get rid of all the CHEM classes except with the most values - do same for all the different classes and print just one professor name instead of replicas. – Benji Apr 22 '16 at 08:14

1 Answers1

0

I think this program does what you want:

import json


with open('result.json') as open_json:
    load_as_json = json.load(open_json)

courses = {}
for professor in load_as_json['professors']:
    for review in professor['reviews']:
        course = courses.setdefault(review['class'], {})
        course.setdefault('course_name', review['class'])
        course.setdefault('course_mentioned_times', 0)
        course['course_mentioned_times'] += 1
        course.setdefault('professors', [])
        prof_name = {
            'first_name': professor['first_name'],
            'last_name': professor['last_name'],
        }
        if prof_name not in course['professors']:
            course['professors'].append(prof_name)

courses = {
    'courses': sorted(courses.values(),
                      key=lambda x: x['course_mentioned_times'],
                      reverse=True)
}
with open('output_info.json', 'w') as outfile:
    json.dump(courses, outfile, indent=4)

Result, using the example input in the question:

{
    "courses": [
        {
            "professors": [ 
                {
                    "first_name": "Laura",
                    "last_name": "Stoker"
                }
            ], 
            "course_name": "PS3", 
            "course_mentioned_times": 1
        }, 
        {
            "professors": [
                {
                    "first_name": "Laura", 
                    "last_name": "Stoker"
                }
            ],
            "course_name": "164A", 
            "course_mentioned_times": 1
        },
        {
            "professors": [
                {
                    "first_name": "Richard", 
                    "last_name": "Saykally"
                }
            ], 
            "course_name": "CHEM 1A", 
            "course_mentioned_times": 1
        }, 
        {
            "professors": [
                {
                    "first_name": "Richard", 
                    "last_name": "Saykally"
                }
            ], 
            "course_name": "CHEMISTRY1A", 
            "course_mentioned_times": 1
        }
    ]
}
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • for now my output looks like this. But I have professor name duped. "courses": [ { "professors": [ { "first_name": "Richard", "last_name": "Saykally" }, { "first_name": "Richard", "last_name": "Saykally" }, I'd like only one professor name printed not riplicas Like only one Richard Saykally under professor array for that one specific classes. Like Multiple professors but without dupe of their name. – Benji Apr 22 '16 at 02:09
  • You're a life saver. One last question. So I've formatted my courses name to where I have letters and numbers sperated. I'd like to compare the letters and just print out the most mentioned courses for instance. There is CHEM 1A and CHEM 214 -> I compare the first letter for instace CHEM and CHEM -> They're same. So I just append the most mentioned courses out of those 2 into my dictionary – Benji Apr 22 '16 at 18:38