7

I have a dictionary like this:

{device1 : (news1, news2, ...), device2 : (news 2, news 4, ...)...}

How to convert them into a 2-D 0-1 matrix in python? Looks like this:

         news1 news2 news3 news4
device1    1     1     0      0
device2    0     1     0      1
device3    1     0     0      1
Spencer
  • 2,276
  • 3
  • 10
  • 15
  • Do you want to just print the output in the given format or you want it in a list (or probably list of list)? What do you exactly mean by converting to a 2-D matrix? – yeniv May 23 '17 at 03:55
  • @yeniv Well, I want to convert into a binary matrix so that I can do some matrix operations later, like calculating cosine similarities etc. – Spencer May 23 '17 at 04:10

3 Answers3

6

Here is some code that will create a matrix (or 2D array) using the numpy package. Note that we have to use a list of the names in order because dictionaries do not necessarily store the keys/values in the order they are entered.

import numpy as np

dataDict = {'device1':(1,1,0,1), 'device2':(0,1,0,1), 'device3':(1,0,0,1)}
orderedNames = ['device1','device2','device3']

dataMatrix = np.array([dataDict[i] for i in orderedNames])

print dataMatrix

The output is:

[[1 1 0 1]
 [0 1 0 1]
 [1 0 0 1]]
Robbie
  • 4,672
  • 1
  • 19
  • 24
3

Adding on to this since I think previous answers assume you have your data structured differently and don't directly address your issue.

Assuming I'm understanding your data structure correctly and the names of the indices in your matrix don't really matter:

from sklearn.feature_extraction import DictVectorizer

dict = {'device1':['news1', 'news2'],
        'device2':['news2', 'news4'],
        'device3':['news1', 'news4']}

restructured = []

for key in dict:
    data_dict = {}
    for news in dict[key]:
        data_dict[news] = 1
    data_dict['news3'] = 0
    restructured.append(data_dict)

#restructured should now look like
'''
[{'news1':1, 'news2':1, 'news3':0},
 {'news2':1, 'news4':1, 'news3':0},
 {'news1':1, 'news4':1, 'news3':0}]
'''

dictvectorizer = DictVectorizer(sparse=False)
features = dictvectorizer.fit_transform(restructured)

print(features)

#output
'''
[[1, 1, 0, 0],
 [0, 1, 1, 0],
 [1, 0, 1, 0]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['news1', 'news2', 'news4', 'news3']
'''
mgrogger
  • 194
  • 1
  • 9
2

Here is another choice to convert a dictionary to a matrix:

# Load library
from sklearn.feature_extraction import DictVectorizer

# Our dictionary of data
data_dict = [{'Red': 2, 'Blue': 4},
             {'Red': 4, 'Blue': 3},
             {'Red': 1, 'Yellow': 2},
             {'Red': 2, 'Yellow': 2}]
# Create DictVectorizer object
dictvectorizer = DictVectorizer(sparse=False)

# Convert dictionary into feature matrix
features = dictvectorizer.fit_transform(data_dict)
print(features)
#output
'''
[[4. 2. 0.]
 [3. 4. 0.]
 [0. 1. 2.]
 [0. 2. 2.]]
'''
print(dictvectorizer.get_feature_names())
#output
'''
['Blue', 'Red', 'Yellow']
'''
tolgabuyuktanir
  • 646
  • 6
  • 20