2

Problem Definition

I have multiple lists of tuples that I want to convert into a single pandas Dataframe, but I haven't found a performant way of doing it so far.

Current Output:

                           column_a                 column_b
2005-01-02 22:15:00        (1, 1)                   (True, 0, 0)
2005-01-02 22:30:00        (0, 0)                   (True, 0, 0)
2005-01-02 22:45:00        (0, 0)                   (True, 0, 0)
2005-01-02 23:00:00        (0, 0)                   (True, 0, 0)
2005-01-02 23:15:00        (0, 0)                   (True, 0, 0)

Desired output:

                           column_a      column_b     column_c     column_d,    column_e
2005-01-02 22:15:00        1             1            True         0            0
2005-01-02 22:30:00        0             0            True         0            0
2005-01-02 22:45:00        0             0            True         0            0
2005-01-02 23:00:00        0             0            True         0            0
2005-01-02 23:15:00        0             0            True         0            0

Solution I've tried but I am not satisfied with

I tried converting column_a and column_b to pandas DataFrames and then joining them, but I found this to be way too slow to scalate with a larger number of lists of tuples (which I expect to have).

Next solution I am going to try

I will to try to convert this list of tuples in a list of lists, and then append this with the other lists of lists (converted from list of tuples) and then to pandas DataFrame.

joaoavf
  • 1,343
  • 1
  • 12
  • 25

2 Answers2

2

If every column is a column of tuples, then you could create individual dataframes from each and concatenate them at the end -

df_list = []
for c in df.columns:
     df_list.append(pd.DataFrame(df[c].tolist()))

ndf = pd.concat(df_list, 1, ignore_index=True).add_prefix('col_')
ndf.index = df.index

ndf
                     col_0  col_1  col_2  col_3  col_4
2005-01-02 22:15:00      1      1   True      0      0
2005-01-02 22:30:00      0      0   True      0      0
2005-01-02 22:45:00      0      0   True      0      0
2005-01-02 23:00:00      0      0   True      0      0
2005-01-02 23:15:00      0      0   True      0      0
cs95
  • 379,657
  • 97
  • 704
  • 746
  • 2
    @joaoavf Would you like an explanation? What I'm doing it converting each column of tuples into a list of tuples, which I then pass to the dataframe constructor. This list of tuples is parsed into separate columns, and is a neat trick and much faster than apply + pd.Series, which also does the same thing but is much slower. Learned this trick from jezrael. – cs95 Nov 26 '17 at 22:49
1

Something like this? Using Making a flat list out of list of lists in Python

import pandas as pd

data = dict(column_a = [(1,2),(3,4)], column_b = [(True,1,2),(True,3,4)])

df = pd.DataFrame(data)

df = (df.apply(lambda x: pd.Series(item for sublist in x for item in sublist), axis=1)
      .rename(columns=dict(zip(range(5),["column_{}".format(i) for i in list("abcde")]))))

print(df)

Returns:

   column_a  column_b column_c  column_d  column_e
1         1         2     True         1         2
2         3         4     True         3         4
Anton vBR
  • 18,287
  • 5
  • 40
  • 46