I've been scouring past answers for the past couple hours and I'm still stuck. I have two classes:
class Profile:
def __init__(self, department_id, profile_id, items=None):
self.department_id = department_id
self.profile_id = profile_id
self.items = items
class Profile_Item:
def __init__(self, profile_item, profile_value):
self.profile_item = profile_item
self.profile_value = profile_value
I am trying to place this list into a Dataframe obj that I want to use to various calculations and pivot the data. The problem is my items attribute is a list of dictionaries. I created a function to form a dictionary that I can use for the Dataframe:
def as_dict(self):
z = []
for x in self.items:
z.append({
'profile_item': x.profile_item,
'profile_value': x.profile_value
})
return {
'department_id': self.department_id,
'profile_id': self.profile_id,
'profile_items': z
}
If I create the Dataframe using the function this is where I get stuck on the last column:
df = pd.DataFrame([b.as_dict() for b in profile_list])
df.head()
+---+---------------+--------------+--------------------------------------------+
| | department_id | profile_id | profile_items |
+---+---------------+--------------+--------------------------------------------+
| 0 | 101500014 | 2.101250e+08 | [{'profile_item': 2.0, 'profile_value': 45 |
| 1 | 101500015 | 2.100991e+09 | [{'profile_item': 2.0, 'profile_value': 76 |
| 2 | 101500025 | 2.100760e+08 | [{'profile_item': 2.0, 'profile_value': 89 |
| 3 | 101500034 | 2.100000e+11 | [{'profile_item': 2.0, 'profile_value': 99 |
+---+---------------+--------------+--------------------------------------------+
It's list of the same keys (profile_item and profile_value) but just different values. Basically every department has a value per item. I'd like the dataframe to have columns with department_id, profile_id, profile_item, and profile_value. This post: Python Dataframe contains a list of dictionaries, need to create new dataframe with dictionary items almost got me there but it's only repeating one column based on the column with a list of dictionaries.
df2 = pd.DataFrame(dict(
department_id=df.department_id.values.repeat(df['profile_items'].str.len()),
)).join(pd.DataFrame(df['profile_items'].sum()))
df2.head()
+---+---------------+--------------+---------------+
| | department_id | profile_item | profile_value |
+---+---------------+--------------+---------------+
| 1 | 101500014 | 9.0 | 120 |
| 2 | 101500014 | 10.0 | 39009 |
| 3 | 101500014 | 11.0 | 1403903 |
+---+---------------+--------------+---------------+
I have multiple and I'm not super familiar with pandas. I'd like to think that I can somehow fix this data before I even get to pandas via a function or comprehension?
I want to split the list of dictionaries into individual rows and then join said columns to the department_id and profile_id:
+---------------+------------+--------------+---------------+
| department_id | profile_id | profile_item | profile_value |
+---------------+------------+--------------+---------------+
| 101500014 | 210125000 | 2 | 45 |
| 101500014 | 210125000 | 9 | 120 |
| 101500014 | 210125000 | 10 | 39009 |
| 101500014 | 210125000 | 11 | 1321312 |
| 101500014 | 210125000 | 15 | 12313213 |
| 101500015 | 2100991000 | 2 | 76 |
| 101500015 | 2100991000 | 9 | 4234 |
| 101500015 | 2100991000 | 10 | 34 |
| 101500015 | 2100991000 | 11 | 34 |
| 101500015 | 2100991000 | 15 | 3 |
| 101500015 | 2100991000 | 17 | 43 |
| 101500015 | 2100991000 | 18 | 4 |
| 101500015 | 2100991000 | 21 | 32 |
| 101500015 | 2100991000 | 22 | 23423 |
| 101500015 | 2100991000 | 35 | 2 |
+---------------+------------+--------------+---------------+