Make a pandas dataframe with unequal rows and columns

Question

I have a JSON file that I took the keys from to use as rows for my data frame, next I took all the values from all the keys and put them into a flattened list. I want to use that list of values as the columns. However there's eight values and 5 keys

JSON:

{
"student1": [
"view_grades",
"view_classes"
],
"student2": [
"view_grades",
"view_classes"
],
"teacher": [
"view_grades",
"change_grades",
"add_grades",
"delete_grades",
"view_classes"
],
"principle": [
"view_grades",
"view_classes",
"change_classes",
"add_classes",
"delete_classes"
]
}

convert.py

def json_to_csv():
    with open('C:/Users/Elitebook/Documents/GitHub/permissions.json') as json_file:
        #convert to python dict
        py_dict = json.load(json_file)
        #first get a list of all the values(permissions) from the dict, flatten the list and return only unique values
        permissions = sorted(set([key for value in py_dict.itervalues() for key in value]))


        #create a dataframe from the python dictionary
        pd.DataFrame.from_dict(py_dict, orient='index', columns=permissions)

I'm getting the AssertionError: 8 columns passed, passed data had 5 columns error, I want to have it so I can have the 8 columns and 5 rows. Then I can put what I want in the value fields for the dataframe

Jeremy Bare · Answer 1 · 2018-07-03T03:57:56.807

So based on your description I believe your columns and rows are

columns = [
"view_grades",
"view_classes",
"change_grades",
"add_grades",
"delete_grades",
"change_classes",
"add_classes",
"delete_classes"]

rows = [
"student1",
"student2",
"teacher",
"principle"]

what you want to do is set the rows as the index

df = pd.DataFrame(index=rows, columns=permissions)

print(df)
+-----------+-------------+--------------+---------------+------------+---------------+----------------+-------------+----------------+
|           | view_grades | view_classes | change_grades | add_grades | delete_grades | change_classes | add_classes | delete_classes |
+-----------+-------------+--------------+---------------+------------+---------------+----------------+-------------+----------------+
| student1  | NaN         | NaN          | NaN           | NaN        | NaN           | NaN            | NaN         | NaN            |
| student2  | NaN         | NaN          | NaN           | NaN        | NaN           | NaN            | NaN         | NaN            |
| teacher   | NaN         | NaN          | NaN           | NaN        | NaN           | NaN            | NaN         | NaN            |
| principle | NaN         | NaN          | NaN           | NaN        | NaN           | NaN            | NaN         | NaN            |
+-----------+-------------+--------------+---------------+------------+---------------+----------------+-------------+----------------+

Hey thanks for the reply, I tried setting orient='index' and that got the proper rows. I set columns to permission but I was getting that error — Amon, Jul 03 '18 at 05:17
Im stupid nevermind I just returned all the keys and set that to the index. — Amon, Jul 03 '18 at 07:57

Ashish Acharya · Accepted Answer · 2018-07-03T08:35:09.923

Here's what you can do:

from collections import defaultdict


def json_to_csv():
    with open('C:/Users/Elitebook/Documents/GitHub/permissions.json') as json_file:
        # convert to python dict
        py_dict = json.load(json_file)

        # first get a list of all the values(permissions) from the dict, flatten the list and return only unique values
        # this is not necessary anymore since the code below automatically gets a list of unique permissions
        # but if you still want to to it this way it's quite possible
        # permissions = sorted(set([key for value in py_dict.itervalues() for key in value]))

        # create a dictionary of dictionaries in which to put values and populate it
        final = defaultdict(dict)

        # loop through the outer dictionary {'principle': ...}
        for k, v in py_dict.items():
            # loop through the inner list ['add_classes', 'change_classes' ...]
            for i in v:
                # create a key final['principle']['add_classes'] in the final dictionary
                # and set its value to True
                final[k][i] = True

        # This is what final looks like
        # defaultdict(<class 'dict'>,
        #     {'principle': {'add_classes': True,
        #                    'change_classes': True,
        #                    'delete_classes': True,
        #                    'view_classes': True,
        #                    'view_grades': True},
        #      'student1': {'view_classes': True, 'view_grades': True},
        #      'student2': {'view_classes': True, 'view_grades': True},
        #      'teacher': {'add_grades': True,
        #                  'change_grades': True,
        #                  'delete_grades': True,
        #                  'view_classes': True,
        #                  'view_grades': True}})

        # now create the dataframe
        # fillna basically replaces whatever is not available (eg. can student1 add_grades?) by False.
        df = pd.DataFrame(final).fillna(False)

Output:

                student1  student2  teacher  principle
add_classes        False     False    False       True
add_grades         False     False     True      False
change_classes     False     False    False       True
change_grades      False     False     True      False
delete_classes     False     False    False       True
delete_grades      False     False     True      False
view_classes        True      True     True       True
view_grades         True      True     True       True

If you want it the other way around, just transpose the DataFrame:

df.T

Output:

           add_classes  add_grades  change_classes     ...       delete_grades  view_classes  view_grades
student1         False       False           False     ...               False          True         True
student2         False       False           False     ...               False          True         True
teacher          False        True           False     ...                True          True         True
principle         True       False            True     ...               False          True         True

what is `a` supposed to be referring to? This looks like a good solution it's just annoying that I can't simply set the column names using the `column` parameter — Amon, Jul 03 '18 at 07:20
What you did with setting the values to True or False depending on whether that value exists for that key is exactly what I needed. I'm not sure how exactly you code achieved that though, how does that work? — Amon, Jul 03 '18 at 08:05
@Amon, I added some clarifications about the code. Basically you don't need to set the column names using the `columns` parameter. But if you want to do it, you can simply add it in. — Ashish Acharya, Jul 03 '18 at 08:33
thanks a lot! I'm just confused as to why we need to create an outer dict? — Amon, Jul 03 '18 at 08:38
There are probably other ways of doing it, but I found this to be pretty short and simple. :) — Ashish Acharya, Jul 03 '18 at 08:41
Also using my way I'm not sure I could still set the proper values to True or False like you did? — Amon, Jul 03 '18 at 08:41
Yes your way works very well don't get me wrong it just confuses me a bit — Amon, Jul 03 '18 at 08:42
I'm understanding more now, could you explain how `defaultdict` works? If a certain key isn't found it will replace that key with a default one? — Amon, Jul 03 '18 at 09:00
Yeah, that's about right. This answer explains it well: https://stackoverflow.com/a/5900634/4737952 — Ashish Acharya, Jul 03 '18 at 09:02
Okay, and in the context of this code what is it doing? what is the default that `dict` creates? — Amon, Jul 03 '18 at 09:04
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/174227/discussion-between-ashish-acharya-and-amon). — Ashish Acharya, Jul 03 '18 at 09:04

Make a pandas dataframe with unequal rows and columns

2 Answers2