Populating a pandas dataframe from an odd dictionary

Question

I have a dictionary as follows:

{'header_1': ['body_1', 'body_3', 'body_2'],
 'header_2': ['body_6', 'body_4', 'body_5'],
 'header_4': ['body_7', 'body_8'],
 'header_3': ['body_9'],
 'header_9': ['body_10'],
 'header_10': []}

I would like to come up with a dataframe like this:

+----+----------+--------+
| ID | header   | body   |
+----+----------+--------+
| 1  | header_1 | body_1 |
+----+----------+--------+
| 2  | header_1 | body_3 |
+----+----------+--------+
| 3  | header_1 | body_2 |
+----+----------+--------+
| 4  | header_2 | body_6 |
+----+----------+--------+
| 5  | header_2 | body_4 |
+----+----------+--------+
| 6  | header_2 | body_5 |
+----+----------+--------+
| 7  | header_4 | body_7 |
+----+----------+--------+

Where blank items (such as for the key header_10 in the dict above) would receive a value of None. I have tried a number of varieties for df.loc such as:

for header_name, body_list in all_unique.items():
    for body_name in body_list:
        metadata.loc[metadata.index[-1]] = [header_name, body_name]

To no avail. Surely there must be a quick way in Pandas to append rows and autoincrement the index? Something similar to the SQL INSERT INTO statement only using pythonic code?

What if you just transformed your dictionary into something pandas could handle before hand? — Mad Physicist, Jan 04 '19 at 15:11
That would be inefficient don't you think? It would introduce additional code... — user32882, Jan 04 '19 at 15:14
More inefficient than attempting to reallocate the entire dataframe at every step? Because that's what appending to it will do. — Mad Physicist, Jan 04 '19 at 15:17
For comparison, you have the dictionary, which is a data structure that is specifically designed to be mutated efficiently. More code does not mean less efficient code. — Mad Physicist, Jan 04 '19 at 15:18
@W-B I would post that answer again. It is precisely what I needed — user32882, Jan 04 '19 at 15:19

jezrael · Accepted Answer · 2019-01-04T15:28:19.030

Use dict comprehension for add Nones for empty lists and then flatten for list of tuples:

d = {'header_1': ['body_1', 'body_3', 'body_2'],
 'header_2': ['body_6', 'body_4', 'body_5'],
 'header_4': ['body_7', 'body_8'],
 'header_3': ['body_9'],
 'header_9': ['body_10'],
 'header_10': []}

d = {k: v if bool(v) else [None] for k, v in d.items()}
data = [(k, y) for k, v in d.items() for y in v]
df = pd.DataFrame(data, columns= ['a','b'])
print (df)
            a        b
0    header_1   body_1
1    header_1   body_3
2    header_1   body_2
3    header_2   body_6
4    header_2   body_4
5    header_2   body_5
6    header_4   body_7
7    header_4   body_8
8    header_3   body_9
9    header_9  body_10
10  header_10     None

Another solution:

data = []
for k, v in d.items():
    if bool(v):
        for y in v:
            data.append((k, y))
    else:
        data.append((k, None))


df = pd.DataFrame(data, columns= ['a','b'])
print (df)
            a        b
0    header_1   body_1
1    header_1   body_3
2    header_1   body_2
3    header_2   body_6
4    header_2   body_4
5    header_2   body_5
6    header_4   body_7
7    header_4   body_8
8    header_3   body_9
9    header_9  body_10
10  header_10     None

Very nice/creative answer. You can also use `if v` instead of `if bool(v)`. — RoadRunner, Jan 04 '19 at 15:46

BENY · Answer 2 · 2019-01-04T15:24:13.950

2

This is another unnesting problem again

Borrow Jez's setting up for your d

d = {k: v if bool(v) else [None] for k, v in d.items()}

1st convert your dict into dataframe

df=pd.Series(d).reset_index()
df.columns
Out[204]: Index(['index', 0], dtype='object')

Then using this function in here

yourdf=unnesting(df,[0])
yourdf
Out[208]: 
         0      index
0   body_1   header_1
0   body_3   header_1
0   body_2   header_1
1   body_6   header_2
1   body_4   header_2
1   body_5   header_2
2   body_7   header_4
2   body_8   header_4
3   body_9   header_3
4  body_10   header_9
5     None  header_10

def unnesting(df, explode):
    idx=df.index.repeat(df[explode[0]].str.len())
    df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
    df1.index=idx
    return df1.join(df.drop(explode,1),how='left')

edited Jan 04 '19 at 15:24

answered Jan 04 '19 at 15:16

BENY

317,841
20
164
234

@jezrael ok let me delete my answer – BENY Jan 04 '19 at 15:18
I think this answer is still valuable as it shows an alternate approach. But I do prefer @jezrael's method since it requires less lines of code – user32882 Jan 04 '19 at 15:27
@user32882 no worry , his answer is better for sure – BENY Jan 04 '19 at 15:28

score 2 · Answer 3 · answered Jan 04 '19 at 15:19

If the dataset is too big, this solution would be slow, but it should still work.

for key in data.keys():
    vals= data[key]
    # Create temp df with data from a single key
    t_df = pd.DataFrame({'header':[key]*len(vals),'body':vals})

    # Append it to your full dataframe.
    df = df.append(t_df)

Populating a pandas dataframe from an odd dictionary

3 Answers3