List of LISTS of tuples to Pandas dataframe?

Question

I have a list of lists of tuples, where every tuple is of equal length, and I need to convert the tuples to a Pandas dataframe in such a way that the columns of the dataframe are equal to the length of the tuples, and each tuple item is a row entry across the columns.

I have consulted other questions on this topic (e.g., Convert a list of lists of tuples to pandas dataframe, List of list of tuples to pandas dataframe, split list of tuples in lists of list of tuples) unsuccessfully.

The closest I get is with list comprehension from a different question on Stack Overflow:

import pandas as pd

tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]

# Trying list comprehension from previous stack question:
pd.DataFrame([[y for y in x] for x in tupList])

But this yields the unintended result:

    0                                 1
0   (commentID, commentText, date)    (123456, blahblahblah, 2019)
1   (45678, hello world, 2018)        (0, text, 2017)

When the expected result is as follows:

      0            1                 2
0     commentID    commentText       date
1     123456       blahblahblah      2019
2     45678        hello world       2018
3     0            text              2017

In sum: I need columns equal to the length of each tuple (in the example, 3), where each item within the tuple is a row entry across the columns.

Thanks!

score 9 · Accepted Answer · answered Aug 15 '19 at 13:00

Just flatten your list into a list of tuples (your initial list contains a sublists of tuples):

In [1251]: tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]

In [1252]: pd.DataFrame([t for lst in tupList for t in lst])
Out[1252]: 
           0             1     2
0  commentID   commentText  date
1     123456  blahblahblah  2019
2      45678   hello world  2018
3          0          text  2017

score 3 · Answer 2 · answered Aug 15 '19 at 13:01

3

tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]
print(pd.DataFrame(sum(tupList,[])))

Output

           0             1     2
0  commentID   commentText  date
1     123456  blahblahblah  2019
2      45678   hello world  2018
3          0          text  2017

answered Aug 15 '19 at 13:01

ComplicatedPhenomenon

4,055
2
18
45

Oeh I like this one a lot, so clever. Hope OP accepts this one, +1 – Erfan Aug 15 '19 at 13:03

score 2 · Answer 3 · answered Aug 15 '19 at 13:00

A shorter code this:

from itertools import chain
import pandas as pd

tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]

new_list = [x for x in chain.from_iterable(tupList)]
df = pd.DataFrame.from_records(new_list)

Edit

You can make the list comprehension directly in the from_records function.

score 0 · Answer 4 · answered Aug 15 '19 at 13:04

0

You can do it like this :D

tupList = [[('commentID', 'commentText', 'date'), ('123456', 'blahblahblah', '2019')], [('45678', 'hello world', '2018'), ('0', 'text', '2017')]]

# Trying list comprehension from previous stack question:
df = pd.DataFrame([[y for y in x] for x in tupList])
df_1 = df[0].apply(pd.Series).assign(index= range(0, df.shape[0]*2, 2)).set_index("index")
df_2 = df[1].apply(pd.Series).assign(index= range(1, df.shape[0]*2, 2)).set_index("index")

pd.concat([df_1, df_2], axis=0).sort_index()

answered Aug 15 '19 at 13:04

ivallesp

2,018
1
14
21

`apply(pd.Series)` is one of the worst things you can do with pandas honestly. It's terribly slow. – Erfan Aug 15 '19 at 13:07
How do you know? – Erfan Aug 15 '19 at 13:08

List of LISTS of tuples to Pandas dataframe?

4 Answers4

Linked

Related