Take multiple lists into dataframe

Question

How do I take multiple lists and put them as different columns in a python dataframe? I tried this solution but had some trouble.

Attempt 1:

Have three lists, and zip them together and use that res = zip(lst1,lst2,lst3)
Yields just one column

Attempt 2:

percentile_list = pd.DataFrame({'lst1Tite' : [lst1],
                                'lst2Tite' : [lst2],
                                'lst3Tite' : [lst3] }, 
                                columns=['lst1Tite','lst1Tite', 'lst1Tite'])

yields either one row by 3 columns (the way above) or if I transpose it is 3 rows and 1 column

How do I get a 100 row (length of each independent list) by 3 column (three lists) pandas dataframe?

maxymoo · Accepted Answer · 2018-08-14T23:47:47.897

I think you're almost there, try removing the extra square brackets around the lst's (Also you don't need to specify the column names when you're creating a dataframe from a dict like this):

import pandas as pd
lst1 = range(100)
lst2 = range(100)
lst3 = range(100)
percentile_list = pd.DataFrame(
    {'lst1Title': lst1,
     'lst2Title': lst2,
     'lst3Title': lst3
    })

percentile_list
    lst1Title  lst2Title  lst3Title
0          0         0         0
1          1         1         1
2          2         2         2
3          3         3         3
4          4         4         4
5          5         5         5
6          6         6         6
...

If you need a more performant solution you can use np.column_stack rather than zip as in your first attempt, this has around a 2x speedup on the example here, however comes at bit of a cost of readability in my opinion:

import numpy as np
percentile_list = pd.DataFrame(np.column_stack([lst1, lst2, lst3]), 
                               columns=['lst1Title', 'lst2Title', 'lst3Title'])

Is np.column_stack a view, or does it copy the data. (If copy, it seems like this could be much more efficient (O(1), not O(n)). — user48956, Oct 16 '17 at 17:27
@maxymoo can column names be automatically set to the list name? — joe5, Jan 08 '19 at 22:10
numpy column stack does not work well if the lists are of different datatypes — user6386155, May 17 '19 at 19:21

score 89 · Answer 2 · edited Jun 16 '17 at 09:47

89

Adding to Aditya Guru's answer here. There is no need of using map. You can do it simply by:

pd.DataFrame(list(zip(lst1, lst2, lst3)))

This will set the column's names as 0,1,2. To set your own column names, you can pass the keyword argument columns to the method above.

pd.DataFrame(list(zip(lst1, lst2, lst3)),
              columns=['lst1_title','lst2_title', 'lst3_title'])

edited Jun 16 '17 at 09:47

legoscia

39,593
22
116
167

answered Jun 16 '17 at 09:22

Abhinav Gupta

1,838
14
15

11

In Python 3.8, and Pandas 1.0, we don't need to use list function, since DataFrame expects an iterable, and zip() returns an iterable object. So, `pd.DataFrame(zip(lst1, lst2, lst3))` should also do. – Sarfraaz Ahmed Apr 16 '20 at 11:26

score 20 · Answer 3 · answered Jul 07 '18 at 08:18

20

Adding one more scalable solution.

lists = [lst1, lst2, lst3, lst4]
df = pd.concat([pd.Series(x) for x in lists], axis=1)

answered Jul 07 '18 at 08:18

oopsi

1,919
3
21
28

1

can you explain this one a bit? – ZakS Jul 18 '18 at 08:21
2

You join (concat) series vertically (axis=1) to create DataFrame from the list of lists – yona bendelac Aug 14 '18 at 13:51

score 18 · Answer 4 · answered Jul 09 '20 at 05:47

There are several ways to create a dataframe from multiple lists.

list1=[1,2,3,4]
list2=[5,6,7,8]
list3=[9,10,11,12]

pd.DataFrame({'list1':list1, 'list2':list2, 'list3'=list3})
pd.DataFrame(data=zip(list1,list2,list3),columns=['list1','list2','list3'])

score 14 · Answer 5 · answered Feb 19 '17 at 18:44

14

Just adding that using the first approach it can be done as -

pd.DataFrame(list(map(list, zip(lst1,lst2,lst3))))

answered Feb 19 '17 at 18:44

Aditya Guru

646
2
10
18

score 11 · Answer 6 · answered Jan 16 '19 at 14:55

11

Adding to above answers, we can create on the fly

df= pd.DataFrame()
list1 = list(range(10))
list2 = list(range(10,20))
df['list1'] = list1
df['list2'] = list2
print(df)

hope it helps !

answered Jan 16 '19 at 14:55

Wickkiey

4,446
2
39
46

score 5 · Answer 7 · edited Nov 27 '19 at 14:30

@oopsi used pd.concat() but didn't include the column names. You could do the following, which, unlike the first solution in the accepted answer, gives you control over the column order (avoids dicts, which are unordered):

import pandas as pd
lst1 = range(100)
lst2 = range(100)
lst3 = range(100)

s1=pd.Series(lst1,name='lst1Title')
s2=pd.Series(lst2,name='lst2Title')
s3=pd.Series(lst3 ,name='lst3Title')
percentile_list = pd.concat([s1,s2,s3], axis=1)

percentile_list
Out[2]: 
    lst1Title  lst2Title  lst3Title
0           0          0          0
1           1          1          1
2           2          2          2
3           3          3          3
4           4          4          4
5           5          5          5
6           6          6          6
7           7          7          7
8           8          8          8
...

`dict`s are not unordered for python > 2 – jtlz2 Oct 13 '22 at 19:31 — jtlz2, Oct 13 '22 at 19:31

score 4 · Answer 8 · edited Nov 09 '22 at 17:45

4

you can simply use this following code

train_data['labels']= train_data[["LABEL1","LABEL1","LABEL2","LABEL3","LABEL4","LABEL5","LABEL6","LABEL7"]].values.tolist()
train_df = pd.DataFrame(train_data, columns=['text','labels'])

edited Nov 09 '22 at 17:45

jtlz2

7,700
9
64
114

answered Jun 04 '20 at 19:28

Shaina Raza

1,474
17
12

score 1 · Answer 9 · answered Oct 13 '22 at 19:41

I just did it like this (python 3.9):

import pandas as pd
my_dict=dict(x=x, y=y, z=z) # Set column ordering here
my_df=pd.DataFrame.from_dict(my_dict)

This seems to be reasonably straightforward (albeit in 2022) unless I am missing something obvious...

In python 2 one could've used a collections.OrderedDict().

Take multiple lists into dataframe

9 Answers9

Linked

Related