2

Currently I have a dictionary like this: {'axin': ['GAUAUG', 'AAAUGC', 'AAAUGC', 'GUCAUG', 'UUUGAU', 'UUUGAU'], 'hedgehog-protein': ['ACUGGC', 'GUCAUG', 'UUUGAU'], 'pangolin': ['GAACAC', 'GAAAGA', 'UUUGAU', 'AAUACU']}.

I want to create a dataframe with each key as a column, and each value as the row name (or vice-versa, doesn't really matter), whilst also counting the occurrences of each value per key. For example the first few lines of output would be this:

        axin hedgehog-protein pangolin 
GAUAUG  1    0                0        
AAAUGC  2    0                0        
GUCAUG  1    1                1
UUUGAU  2    1                1        
ACUGGC  0    1                0

The most similar question I found was Create a pandas dataframe of counts however they had different types of values for each key, and so weren't common across keys. Also, as the length of my value lists are different for each key, ValueError is a problem I keep running into.

sian
  • 77
  • 7

1 Answers1

4

Use Counter in dictionary comprehension, then replace missing values and convert to integers:

d = {'axin': ['GAUAUG', 'AAAUGC', 'AAAUGC', 'GUCAUG', 'UUUGAU', 'UUUGAU'], 
     'hedgehog-protein': ['ACUGGC', 'GUCAUG', 'UUUGAU'],  
     'pangolin': ['GAACAC', 'GAAAGA', 'UUUGAU', 'AAUACU']}

from collections import Counter

df = pd.DataFrame({k: Counter(v) for k, v in d.items()}).fillna(0).astype(int)
print (df)
        axin  hedgehog-protein  pangolin
AAAUGC     2                 0         0
AAUACU     0                 0         1
ACUGGC     0                 1         0
GAAAGA     0                 0         1
GAACAC     0                 0         1
GAUAUG     1                 0         0
GUCAUG     1                 1         0
UUUGAU     2                 1         1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252