2

I have two lists:

l1 = ['0a',22,44]
l2 = ['0b',25,55,66]

Now I join them so that each list becomes a column of a data frame:

import pandas as p
df1 = p.DataFrame(zip(l1,l2))
df1

I received the data frame with 3 rows and 2 columns (the value 66 of l2 is missed). It looks identical to the definition of ndarray, which says: "all columns must have the same number of rows if ndarray is passed into dataframe". But I don't work with ndarray!

If, however, I join lists as rows of a data frame, then Python saves 66:

df2 = p.DataFrame([l1,l2])

Is there any way to pass lists into dataframe as columns, while saving all values of lists in dataframe

Klausos Klausos
  • 15,308
  • 51
  • 135
  • 217
  • http://stackoverflow.com/a/19736406/4080476 – Brian Pendleton Sep 07 '15 at 17:49
  • possible duplicate of [Creating dataframe from a dictionary where entries have different lengths](http://stackoverflow.com/questions/19736080/creating-dataframe-from-a-dictionary-where-entries-have-different-lengths) – Brian Pendleton Sep 07 '15 at 17:49

2 Answers2

3

Function zip returned list which truncated in length to the length of the shortest argument sequence. So result will be:

In [1]: zip(l1,l2)
Out[1]: [('0a', '0b'), (22, 25), (44, 55)]

To save value 66 use izip_longest from itertools:

In [3]: p.DataFrame(list(itertools.izip_longest(l1, l2)))
Out[3]:
      0   1
0    0a  0b
1    22  25
2    44  55
3  None  66

Or you can use map with None. (but map changed in Python 3.x, so that only works in Python 2.x):

In [4]: p.DataFrame(map(None, l1, l2))
Out[4]:
      0   1
0    0a  0b
1    22  25
2    44  55
3  None  66
Alex Lisovoy
  • 5,767
  • 3
  • 27
  • 28
1

The problem is actually with your zip statement:

>>> zip(l1,l2)
[('0a', '0b'), (22, 25), (44, 55)]

You can create a Series for each of your lists and then concatenate them to create your data frame. Here, I use a dictionary comprehension to create the series. concat requires an NDFrame object, so I first create a DataFrame from each of the Series.

series = {col_name: values 
          for col_name, values in zip([l1[0], l2[0]], 
                                      [l1[1:], l2[1:]])}

df = pd.concat([pd.DataFrame(s, columns=[col]) for col, s in series.iteritems()], axis=1)
>>> df
   0b  0a
0  25  22
1  55  44
2  66 NaN

Also, it appeared that the first element in each list was actually the title to the Series, so I took the liberty of using the first element as the series name.

Alexander
  • 105,104
  • 32
  • 201
  • 196