Flatten of dict of lists into a dataframe

Question

I have a dict of lists say: data = {'a': [80, 130], 'b': [64], 'c': [58,80]} How do I flatten it and convert it into dataframe like the one below:

score 4 · Answer 1 · answered Aug 02 '18 at 12:51

4

One option to flatten the dictionary is

flattened_data = {
    k + str(i): x
    for k, v in data.items()
    for i, x in enumerate(v)
}

resulting in

{'a0': 80, 'a1': 130, 'b0': 64, 'c0': 58, 'c1': 80}

If you insist on 1-based indexing, you can use enumerate(v, 1) instead of enumerate(v). If you want to omit the index in cases where the list has only a single entry, you should use a for loop instead of the dictionary comprehension.

answered Aug 02 '18 at 12:51

Sven Marnach

574,206
118
941
841

1

Thanks for the answer..if the values are non integers say float, I get a TypeError : **'float' object is not iterable**..what should i do if the values are float? @Sven Marnach – RemyM Aug 02 '18 at 13:09
1

This is the first thing I thought of as well. An f-string variant using the start parameter on `enumerate`: **`{f"{k}{i}": v for k, vs in data.items() for i, v in enumerate(vs, 1)}`** – piRSquared Aug 02 '18 at 13:19
Or, to capture the `'b'` when only a single value is present **`{f"{k}{'' if len(vs) == 1 else i}" for k, vs in data.items() for i, v in enumerate(vs, 1)}`** – piRSquared Aug 02 '18 at 13:23

score 2 · Answer 2 · answered Aug 02 '18 at 12:55

Using pd.DataFrame constructor and GroupBy + cumcount:

data = {'a': [80, 130], 'b': [64], 'c': [58,80]}

df = pd.DataFrame([[k, w] for k, v in data.items() for w in v],
                  columns=['Index', '0'])

df['Index'] = df['Index'] + (df.groupby('Index').cumcount() + 1).astype(str)

print(df)

  Index    0
0    a1   80
1    a2  130
2    b1   64
3    c1   58
4    c2   80

jezrael · Accepted Answer · 2018-08-02T13:21:38.657

Use nested list comprehension with if-else if want no count one element lists:

df = pd.DataFrame([('{}{}'.format(k, i), v1) 
                   if len(v) > 1
                   else (k, v1) 
                   for k, v in data.items() 
                   for i, v1 in enumerate(v, 1)], columns=['Index','Data'])
print (df)
  Index  Data
0    a1    80
1    a2   130
2     b    64
3    c1    58
4    c2    80

EDIT:

data = {'a': [80, 130], 'b': np.nan, 'c': [58,80], 'd':[34]}

out = []
for k, v in data.items():
    if isinstance(v, float):
        out.append([k, v])
    else:
        for i, x in enumerate(v, 1):
            if len(v) == 1:
                out.append([k, x])
            else:
                out.append(['{}{}'.format(k, i), x])
print (out)
[['a1', 80], ['a2', 130], ['b', nan], ['c1', 58], ['c2', 80], ['d', 34]]


df = pd.DataFrame(out, columns=['Index','Data'])
print (df)
  Index   Data
0    a1   80.0
1    a2  130.0
2     b    NaN
3    c1   58.0
4    c2   80.0
5     d   34.0

Thanks for the answer..if the values are non integers say float, I get a TypeError : **'float' object is not iterable**..what should i do if the values are float? — RemyM, Aug 02 '18 at 13:10
@RemyM - Not easy, because seems some floats mixed with lists. — jezrael, Aug 02 '18 at 13:22

Scott Boston · Answer 4 · 2018-08-02T13:02:55.227

2

Another way is using from_dict with orient parameter set to 'index' and stack, lastly flatten the multilevels in the index using map and format:

df = pd.DataFrame.from_dict(data, orient='index')
df_out = df.rename(columns=lambda x: x+1).stack()
df_out.index = df_out.index.map('{0[0]}{0[1]}'.format)
print(df_out)

Output:

a1     80.0
a2    130.0
b1     64.0
c1     58.0
c2     80.0
dtype: float64

edited Aug 02 '18 at 13:02

answered Aug 02 '18 at 12:57

Scott Boston

147,308
15
139
187

score 2 · Answer 5 · answered Aug 02 '18 at 13:33

Using itertools and pd.io._maybe_dedup_names

x = (itertools.product(s[0],s[1]) for s in data.items())
z = [item for z in x for item in z]
df = pd.DataFrame(z).set_index(0)
df.index = pd.io.parsers.ParserBase({'names':df.index})._maybe_dedup_names(df.index)

    1
a   80
a.1 130
b   64
c   58
c.1 80

piRSquared · Answer 6 · 2018-08-02T13:53:54.740

I was having fun with variations on Sven Marnach's answer

`defaultdict` and `count`

from collections import defaultdict
from itertools import count

c = defaultdict(lambda:count(1))

{f"{k}{['', next(c[k])][len(V) > 1]}": v for k, V in data.items() for v in V}

{'a1': 80, 'a2': 130, 'b': 64, 'c1': 58, 'c2': 80}

`enumerate`

{f"{k}{['', i][len(V) > 1]}": v for k, V in data.items() for i, v in enumerate(V, 1)}

{'a1': 80, 'a2': 130, 'b': 64, 'c1': 58, 'c2': 80}

score 0 · Answer 7 · answered Aug 02 '18 at 12:54

Imo you should first get the list of dict roots and list of dict leafs.

Like so : [a,b,c] and [[80,130],[64],[58,80]]

Then just parallelize them with a loop to get

[a1,a2,b,c1,c2] and [80,130,64,58,80] (this should take only a few lines of code)

Then load it into a dataframe.

If you need more precise code you can ask :)

Flatten of dict of lists into a dataframe

7 Answers7

`defaultdict` and `count`

`enumerate`

Linked

Flatten of dict of lists into a dataframe

7 Answers7

defaultdict and count

enumerate

Linked

`defaultdict` and `count`

`enumerate`