DataFrame from list of list

Question

I have list of list

u=[[1, 1], [2, 1, 1, 1], [2, 2, 1, 1, 1, 1, 2, 2], [2, 2, 2, 2, 2, 3, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2], [2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 2, 3, 3, 3, 2, 2, 3, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2]]

I want to create a DataFrame using pandas where the rows are indexed by the length of u and the columns are given by the group of numbers inside this list of list.

I want the element of this DataFrame to be the frequency in which the elements occurs. For example, from above, I want to get the following table

In the Table above the column with 1 gives the number of ones in each list while 2 gives a number of 2. In cell (1,1) the number 2 was obtained by counting the number of ones in the first list that is [1,1]. In cell (2,1) the number 3 was obtained by counting a number of ones in the list [2,1,1,1] while in the cell (2,2) the number two was obtained by counting the frequency of two in the list [2,1,1,1] the same procedure was repeated throughout.

I know that to count number of repeating elements in a list I have to use count. for example [1,1,1,2].count(1)=3 what I want to know is to use Pandas so that I get the DataFrame as above. Is it possible to do this?

score 4 · Accepted Answer · answered Feb 19 '17 at 20:37

You can use Counter in order to transform list in to dictionary. Then using pd.DataFrame to convert that dictionary

import pandas as pd
from collections import Counter
df = pd.DataFrame([Counter(u_) for u_ in u]).fillna(0)

note that there is no 4 in here, you can manually add it to dictionary or just add the 4 column in dataframe after i.e. df[4] = 0

score 1 · Answer 2 · answered Feb 19 '17 at 20:40

collections.Counter is useful to do this:

First create Counter instances from the lists and use these to instanciate DataFrames:

u=[[1, 1], [2, 1, 1, 1], [2, 2, 1, 1, 1, 1, 2, 2], [2, 2, 2, 2, 2, 3, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2], [2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 2, 3, 3, 3, 2, 2, 3, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2]]
from collections import Counter
import pandas as pd
df = pd.DataFrame([Counter(e) for e in u]).fillna(0)
df[4] = 0.0
print(df)

Output

   1     2    3    4
0  2   0.0  0.0  0.0
1  3   1.0  0.0  0.0
2  4   4.0  0.0  0.0
3  5  10.0  1.0  0.0
4  6  20.0  6.0  0.0

This is possible, because under the hood Counter behaves like a dict.

DataFrame from list of list

2 Answers2

Output

Linked