Two ways to join lists in dataframe: as rows and columns

Question

I have two lists:

l1 = ['0a',22,44]
l2 = ['0b',25,55,66]

Now I join them so that each list becomes a column of a data frame:

import pandas as p
df1 = p.DataFrame(zip(l1,l2))
df1

I received the data frame with 3 rows and 2 columns (the value 66 of l2 is missed). It looks identical to the definition of ndarray, which says: "all columns must have the same number of rows if ndarray is passed into dataframe". But I don't work with ndarray!

If, however, I join lists as rows of a data frame, then Python saves 66:

df2 = p.DataFrame([l1,l2])

Is there any way to pass lists into dataframe as columns, while saving all values of lists in dataframe

possible duplicate of [Creating dataframe from a dictionary where entries have different lengths](http://stackoverflow.com/questions/19736080/creating-dataframe-from-a-dictionary-where-entries-have-different-lengths) — Brian Pendleton, Sep 07 '15 at 17:49

score 3 · Accepted Answer · answered Sep 07 '15 at 18:00

3

Function zip returned list which truncated in length to the length of the shortest argument sequence. So result will be:

In [1]: zip(l1,l2)
Out[1]: [('0a', '0b'), (22, 25), (44, 55)]

To save value 66 use izip_longest from itertools:

In [3]: p.DataFrame(list(itertools.izip_longest(l1, l2)))
Out[3]:
      0   1
0    0a  0b
1    22  25
2    44  55
3  None  66

Or you can use map with None. (but map changed in Python 3.x, so that only works in Python 2.x):

In [4]: p.DataFrame(map(None, l1, l2))
Out[4]:
      0   1
0    0a  0b
1    22  25
2    44  55
3  None  66

answered Sep 07 '15 at 18:00

Alex Lisovoy

5,767
3
27
28

When I run 'pip install itertools' from Command Prompt on Windows, it says 'Could not find a version that satisfies the requirement itertools No matching distribution found for itertools'. I have Python 2.7. Do you know how to tack this error? – Klausos Klausos Sep 08 '15 at 08:17
itertools included in python stdlib. So just import. – Alex Lisovoy Sep 08 '15 at 09:27
It says No module named stdlib (import stdlib as s) – Klausos Klausos Sep 08 '15 at 10:52
`import itertools` instead of stdlib – Alex Lisovoy Sep 08 '15 at 11:14
Ok, thanks. Just one final thing. It says TypeError: 'list' object is not callable. – Klausos Klausos Sep 08 '15 at 12:01
That's strange. Can you provide traceback and data? – Alex Lisovoy Sep 08 '15 at 12:51
Indeed it works if I run it from PyCharm IDE, but it fails when I do the same steps in command line ('list' object is not callable). Anyway I'll accept the answer, since the problem with 'list' is different from the question. – Klausos Klausos Sep 08 '15 at 13:14

score 1 · Answer 2 · answered Sep 07 '15 at 18:00

The problem is actually with your zip statement:

>>> zip(l1,l2)
[('0a', '0b'), (22, 25), (44, 55)]

You can create a Series for each of your lists and then concatenate them to create your data frame. Here, I use a dictionary comprehension to create the series. concat requires an NDFrame object, so I first create a DataFrame from each of the Series.

series = {col_name: values 
          for col_name, values in zip([l1[0], l2[0]], 
                                      [l1[1:], l2[1:]])}

df = pd.concat([pd.DataFrame(s, columns=[col]) for col, s in series.iteritems()], axis=1)
>>> df
   0b  0a
0  25  22
1  55  44
2  66 NaN

Also, it appeared that the first element in each list was actually the title to the Series, so I took the liberty of using the first element as the series name.

Two ways to join lists in dataframe: as rows and columns

2 Answers2