Make a matrix from dataframe

Question

I have a data frame that looks like this:

I want to create a matrix that will count the number of times each time per 'ID', 'col2' and 'col3' says a fruit value:

[Please don't post pictures of text](https://meta.stackoverflow.com/q/285551/4518341). Instead, copy the text itself, [edit] it into your post, and use the formatting tools like [code formatting](/editing-help#code) or table formatting. It also wouldn't hurt to provide code that creates this df; see [How to make good reproducible pandas examples](/q/20109391/4518341). — wjandrea, Apr 26 '22 at 19:50

score 2 · Accepted Answer · 2022-04-26T20:00:33.430

One way (vectorized):

df = df.set_index('ID')
new_df = pd.DataFrame(np.sum(df.to_numpy()[:, None] == np.unique(df.to_numpy())[:, None], axis=2), index=df.index, columns=np.unique(df.to_numpy()))

Output:

>>> new_df
     Apple  Orange  Pear
ID                      
001      0       2     0
002      1       0     1
003      1       0     1

If you want to operate on only a subset of the columns:

subset = ['col2', 'col3']
new_df = pd.DataFrame(np.sum(df[subset].to_numpy()[:, None] == np.unique(df[subset].to_numpy())[:, None], axis=2), index=df.index, columns=np.unique(df[subset].to_numpy()))

what if the data frame contains more columns, but I only want to look at col 2 and col 3 — user, Apr 26 '22 at 19:55

score 1 · Answer 2 · answered Apr 26 '22 at 20:59

You can do:

df.set_index('ID', inplace=True)
dicts = [df.loc[idx].value_counts().to_dict() for idx in df.index]
df2 = pd.DataFrame(dicts, index=df.index).fillna(0).astype(int)

Output df2:

     Orange   Pear   Apple
ID          
001       2      0       0
002       0      1       1
003       0      1       1

Make a matrix from dataframe

2 Answers2