1

I've been scouring past answers for the past couple hours and I'm still stuck. I have two classes:

class Profile:
    def __init__(self, department_id, profile_id, items=None):
        self.department_id = department_id
        self.profile_id = profile_id
        self.items = items

class Profile_Item:
    def __init__(self, profile_item, profile_value):
        self.profile_item = profile_item
        self.profile_value = profile_value

I am trying to place this list into a Dataframe obj that I want to use to various calculations and pivot the data. The problem is my items attribute is a list of dictionaries. I created a function to form a dictionary that I can use for the Dataframe:

    def as_dict(self):
        z = []
        for x in self.items:
            z.append({
                'profile_item': x.profile_item,
                'profile_value': x.profile_value
            })
        
        return {
            'department_id': self.department_id,
            'profile_id': self.profile_id,
            'profile_items': z
        }

If I create the Dataframe using the function this is where I get stuck on the last column:

df = pd.DataFrame([b.as_dict() for b in profile_list])

df.head()


+---+---------------+--------------+--------------------------------------------+
|   | department_id |  profile_id  |               profile_items                |
+---+---------------+--------------+--------------------------------------------+
| 0 |     101500014 | 2.101250e+08 | [{'profile_item': 2.0, 'profile_value': 45 |
| 1 |     101500015 | 2.100991e+09 | [{'profile_item': 2.0, 'profile_value': 76 |
| 2 |     101500025 | 2.100760e+08 | [{'profile_item': 2.0, 'profile_value': 89 |
| 3 |     101500034 | 2.100000e+11 | [{'profile_item': 2.0, 'profile_value': 99 |
+---+---------------+--------------+--------------------------------------------+

It's list of the same keys (profile_item and profile_value) but just different values. Basically every department has a value per item. I'd like the dataframe to have columns with department_id, profile_id, profile_item, and profile_value. This post: Python Dataframe contains a list of dictionaries, need to create new dataframe with dictionary items almost got me there but it's only repeating one column based on the column with a list of dictionaries.

df2 = pd.DataFrame(dict(
 department_id=df.department_id.values.repeat(df['profile_items'].str.len()),
)).join(pd.DataFrame(df['profile_items'].sum()))

df2.head()


+---+---------------+--------------+---------------+
|   | department_id | profile_item | profile_value |
+---+---------------+--------------+---------------+
| 1 |     101500014 |          9.0 |           120 |
| 2 |     101500014 |         10.0 |         39009 |
| 3 |     101500014 |         11.0 |       1403903 |
+---+---------------+--------------+---------------+

I have multiple and I'm not super familiar with pandas. I'd like to think that I can somehow fix this data before I even get to pandas via a function or comprehension?

I want to split the list of dictionaries into individual rows and then join said columns to the department_id and profile_id:


+---------------+------------+--------------+---------------+
| department_id | profile_id | profile_item | profile_value |
+---------------+------------+--------------+---------------+
|     101500014 |  210125000 |            2 |            45 |
|     101500014 |  210125000 |            9 |           120 |
|     101500014 |  210125000 |           10 |         39009 |
|     101500014 |  210125000 |           11 |       1321312 |
|     101500014 |  210125000 |           15 |      12313213 |
|     101500015 | 2100991000 |            2 |            76 |
|     101500015 | 2100991000 |            9 |          4234 |
|     101500015 | 2100991000 |           10 |            34 |
|     101500015 | 2100991000 |           11 |            34 |
|     101500015 | 2100991000 |           15 |             3 |
|     101500015 | 2100991000 |           17 |            43 |
|     101500015 | 2100991000 |           18 |             4 |
|     101500015 | 2100991000 |           21 |            32 |
|     101500015 | 2100991000 |           22 |         23423 |
|     101500015 | 2100991000 |           35 |             2 |
+---------------+------------+--------------+---------------+
  • profile_items consist of a list of dictionaries or just a list with 1 single dictionary. Its not a ideal example if its shown cut at the end. – Andreas Jul 28 '20 at 20:04
  • It's a list of dictionaries. Two key's: profile_item and profile_value but multiple items and values per department. –  Jul 28 '20 at 20:24

1 Answers1

0

I created an example from your data:

import pandas as pd

# create example
lst = [{'profile_item': 2.0, 'profile_value': 45}, {'profile_item': 2.0, 'profile_value': 76}, {'profile_item': 2.0, 'profile_value': 89}]
df = pd.DataFrame(data=[[lst]], columns=["lst"])

The code to for your question:

  1. Get a single row for each item in the list (.explode)
  2. split the dictionary in 2 columns (result_type="expand")

Code:

    # code to create both columns
    df = df.explode("lst")
    df[['profile_item', 'profile_value']] = df.apply(lambda x: [x["lst"].get('profile_item'), x["lst"].get('profile_value')], axis=1, result_type="expand")
Andreas
  • 8,694
  • 3
  • 14
  • 38
  • Thanks. So this splits the list of dictionaries but I also need to join that back into the original data frame. I updated my original post if I wasn't properly explaining my ultimate goal. –  Jul 28 '20 at 21:53
  • Have you tried the code? This should have been achived by .explode – Andreas Jul 28 '20 at 21:58
  • My mistake! I made a new dataframe on the line with the lambda function. Definitely works. –  Jul 28 '20 at 22:25