1

I have list of list

u=[[1, 1], [2, 1, 1, 1], [2, 2, 1, 1, 1, 1, 2, 2], [2, 2, 2, 2, 2, 3, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2], [2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 2, 3, 3, 3, 2, 2, 3, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2]]

I want to create a DataFrame using pandas where the rows are indexed by the length of u and the columns are given by the group of numbers inside this list of list.

I want the element of this DataFrame to be the frequency in which the elements occurs. For example, from above, I want to get the following table DataFrame from the list of list

In the Table above the column with 1 gives the number of ones in each list while 2 gives a number of 2. In cell (1,1) the number 2 was obtained by counting the number of ones in the first list that is [1,1]. In cell (2,1) the number 3 was obtained by counting a number of ones in the list [2,1,1,1] while in the cell (2,2) the number two was obtained by counting the frequency of two in the list [2,1,1,1] the same procedure was repeated throughout.

I know that to count number of repeating elements in a list I have to use count. for example [1,1,1,2].count(1)=3 what I want to know is to use Pandas so that I get the DataFrame as above. Is it possible to do this?

Mafeni Alpha
  • 308
  • 2
  • 13

2 Answers2

4

You can use Counter in order to transform list in to dictionary. Then using pd.DataFrame to convert that dictionary

import pandas as pd
from collections import Counter
df = pd.DataFrame([Counter(u_) for u_ in u]).fillna(0)

note that there is no 4 in here, you can manually add it to dictionary or just add the 4 column in dataframe after i.e. df[4] = 0

titipata
  • 5,321
  • 3
  • 35
  • 59
1

collections.Counter is useful to do this:

First create Counter instances from the lists and use these to instanciate DataFrames:

u=[[1, 1], [2, 1, 1, 1], [2, 2, 1, 1, 1, 1, 2, 2], [2, 2, 2, 2, 2, 3, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2], [2, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 2, 3, 2, 3, 3, 3, 2, 2, 3, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2]]
from collections import Counter
import pandas as pd
df = pd.DataFrame([Counter(e) for e in u]).fillna(0)
df[4] = 0.0
print(df)

Output

   1     2    3    4
0  2   0.0  0.0  0.0
1  3   1.0  0.0  0.0
2  4   4.0  0.0  0.0
3  5  10.0  1.0  0.0
4  6  20.0  6.0  0.0

This is possible, because under the hood Counter behaves like a dict.

Sebastian Wozny
  • 16,943
  • 7
  • 52
  • 69