0

I have a dictionary like below,

{'A': 0, 'C': 0, 'B': 1, 'E': 3, 'D': 1, 'G': 0, 'F': 0, 'I': 3, 'H': 3, 'J': 1}

using this dictionary I want to create a pandas data frame like below,

   A  B  C  D  E  F  G  H  I  J
0  1  0  1  0  0  1  1  0  0  0
1  0  1  0  1  0  0  0  0  0  1
2  0  0  0  0  0  0  0  0  0  0
3  0  0  0  0  1  0  0  1  1  0

the above dictionary's key - value pair represent column name - index using this value I want to create a data frame like above. for example 'A': 0 represents columns A at 0th index value should be one similarly 'E': 3, represents columns E at 3rd index value should be one.

So far I tried this,

df=pd.DataFrame(index=range(max(my_dic.values())),columns=[req_cols])
for u,v in my_dic.items():
    df.at[v,u]=1
print df.fillna(0)

Above code works fine, But I think it's not a effective way to solve this problem. Is there any better effective approach to solve this problem?

any help would be really appreciable.

Thanks in Advance.

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
  • I think your approach is relatively efficient. You can get some speed-up potentially by building a NumPy array first, but this will make your code more complex, since NumPy uses positional indexing rather than labels. – jpp Sep 11 '18 at 11:28
  • @jpp - Thanks for your comments, I basically want to avoid loop in my code, Is there anyway to accomplish this – Mohamed Thasin ah Sep 11 '18 at 11:30

2 Answers2

1

Here's a simple solution but not necessarily the fastest. A faster solution may use a faster one_at_index function. numpy may provide a faster means.

d = {'A': 0, 'C': 0, 'B': 1, 'E': 3, 'D': 1, 'G': 0, 'F': 0, 'I': 3, 'H': 3, 'J': 1}

height = max(value for value in d.values())

def one_at_index(index, height):
    return [0]*index + [1] + [0]*(height - index)

result = pd.DataFrame({key: one_at_index(value, height) for key, value in d.items()})

print(result)

Out:
   A  C  B  E  D  G  F  I  H  J
0  1  1  0  0  0  1  1  0  0  0
1  0  0  1  0  1  0  0  0  0  1
2  0  0  0  0  0  0  0  0  0  0
3  0  0  0  1  0  0  0  1  1  0

If the column order matters to you just add columns=list("ABCDEFGHIJ") or equivalent to the pd.DataFrame call.

Denziloe
  • 7,473
  • 3
  • 24
  • 34
  • thanks for your valuable effort, I'm looking for a little bit more elegant way to solve this problem, I don't wanna add loops. please let me know if you find the way. anyway thanks for your effort – Mohamed Thasin ah Sep 11 '18 at 11:39
  • I don't see how you could possibly do this without looping (actually a comprehension) over the items in the dictionary. How else could you possibly access the specification data? – Denziloe Sep 11 '18 at 11:51
1

The sklearn library offers a solution without an explicit loop.

from sklearn.preprocessing import MultiLabelBinarizer

d = {'A': 0, 'C': 0, 'B': 1, 'E': 3, 'D': 1, 'G': 0, 'F': 0, 'I': 3, 'H': 3, 'J': 1}

mlb = MultiLabelBinarizer()

s = pd.DataFrame(list(d.items())).groupby(1)[0].apply(list).rename_axis(None)

res = pd.DataFrame(mlb.fit_transform(s), columns=mlb.classes_, index=s.index)\
        .reindex(range(s.index.max()+1)).fillna(0).astype(int)

print(res)

   A  B  C  D  E  F  G  H  I  J
0  1  0  1  0  0  1  1  0  0  0
1  0  1  0  1  0  0  0  0  0  1
2  0  0  0  0  0  0  0  0  0  0
3  0  0  0  0  1  0  0  1  1  0
jpp
  • 159,742
  • 34
  • 281
  • 339