From Nested Dictionary to CSV File

Question

I have nested dictionary (with length > 70.000):

users_item = {
    "sessionId1": {
        "12345645647": 1.0, 
        "9798654": 5.0 

    },         
    "sessionId2":{
        "3445657657": 1.0

    },
    "sessionId3": {
        "87967976": 5.0, 
        "35325626436": 1.0, 
        "126789435": 1.0, 
        "72139856": 5.0      
    },
    "sessionId4": {
        "4582317": 1.0         
    }
......
}

I want create CSV file from my nested dictionary, my result will look like:

sessionId1 item rating
sessionId1 item rating
sessionId2 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
.......

I found this post: Convert Nested Dictionary to CSV Table

It's similar to my question but it's not working when I try all answers, pandas library run out of memory

How I can make CSV file with my data?

@Ev.Kounis , sorry for question not clear. I edit my question ! — Paldro, Jul 19 '16 at 09:20
if instead of `item` and `rating` you had the actual values it would be clear from the beginning. — Ma0, Jul 19 '16 at 09:23
@ĐứcPhan did you try and adapt the accepted answer from your link (it does not use `panda` but `csv.DictWriter`) ? — Frodon, Jul 19 '16 at 09:24
even a simple csv writer should work if your nested dict is only 1-depth. That is, loop for the outer dict `key`, `value` and perform another loop in inner dict with `k`, `v`, then write to row for `writer.writerow([key, k, v])` — Anzel, Jul 19 '16 at 09:25
take a look at this: https://docs.python.org/3/library/csv.html#csv.DictWriter — Ma0, Jul 19 '16 at 09:26
I tried both answer : The first answer its so error : `rows = [a]+[[q]+[user_item[p].get(q, '-') for p in a[1:]] for q in x] TypeError: 'dict_keys' object is not subscriptable The seconde answer so error : `writer = csv.DictWriter(outf, [" "] + user_item.keys()) TypeError: can only concatenate list (not "dict_keys") to list` — Paldro, Jul 19 '16 at 09:30

mowcow · Accepted Answer · 2016-07-19T09:54:36.353

1

Just loop through the dictionary and use the Python csv writer to write to the csv file.

with open('output.csv', 'w') as csv_file:
    csvwriter = csv.writer(csv_file, delimiter='\t')
    for session in users_item:
        for item in users_item[session]:
            csvwriter.writerow([session, item, users_item[session][item]])

edited Jul 19 '16 at 09:54

answered Jul 19 '16 at 09:34

mowcow

81
4

you should know `writerow` takes exactly one argument but in your code you given 3. – Paldro Jul 19 '16 at 09:44
@Đức Phan Sorry, forgot the outer brackets, fixed now. – mowcow Jul 19 '16 at 09:46
this is error which your code : `writer.writerow([session, item, user_item[session][item]]) TypeError: 'str' does not support the buffer interface` – Paldro Jul 19 '16 at 09:50
Ah I'm so used to using 'wb' when opening files in python 2. For python 3 use 'w' instead when opening the csv file. Code changed again. – mowcow Jul 19 '16 at 09:56

score 1 · Answer 2 · 2016-07-19T09:47:12.430

1

for session, ratings in users_item.items():
    for rating, value in ratings.items():
        print("{} {}".format(session, value))

Output:

sessionId3 5.0
sessionId3 1.0
sessionId3 5.0
sessionId3 1.0
sessionId1 5.0
sessionId1 1.0
sessionId4 1.0
sessionId2 1.0

Note that a dict (user_items) has no order. So unless you specify the order of rows using some other way, the ouput will be in the order the dict uses internally.

Edit: This approach has no problems with a file containing 70k entries.

Edit: If you want to write to a CSV file, use the csv module or just pipe the output to a file.

edited Jul 19 '16 at 09:47

answered Jul 19 '16 at 09:37

but i need write data into csv file, not print out ! – Paldro Jul 19 '16 at 09:45
You can easily adapt my answer by using https://docs.python.org/3/library/csv.html – Jul 19 '16 at 09:47

score 0 · Answer 3 · edited May 23 '17 at 12:32

Assuming you want each session as a row, the number of columns for every row will be the total number of unique keys in all session dicts. Based on the data you've given, I'm guessing the number of unique keys are astronomical.

That is why you're running into memory issues with the solution given in this discussion. It's simply too much data to hold in memory at one time.

Your only option if my assumptions are correct are to divide and conquer. Break the data into smaller chunks and write them to a file in csv format. Then merge the csv files at the end.

martineau · Answer 4 · 2016-07-19T10:19:39.277

0

If you iteratively write the file, there should be no memory issues:

import csv

users_item = {
    "sessionId1": {
        "12345645647": 1.0,
        "9798654": 5.0

    },
    "sessionId2":{
        "3445657657": 1.0

    },
    "sessionId3": {
        "87967976": 5.0,
        "35325626436": 1.0,
        "126789435": 1.0,
        "72139856": 5.0
    },
    "sessionId4": {
        "4582317": 1.0
    }
}

with open('nested_dict.csv', 'w') as output:
    writer = csv.writer(output, delimiter='\t')
    for sessionId in sorted(users_item):
        ratings = users_item[sessionId]
        for item in ratings:
            writer.writerow([sessionId, item, ratings[item]])

Resulting contents of output file (where » represents a tab characters):

sessionId1»  12345645647»  1.0
sessionId1»  9798654»      5.0
sessionId2»  3445657657»   1.0
sessionId3»  126789435»    1.0
sessionId3»  87967976»     5.0
sessionId3»  35325626436»  1.0
sessionId3»  72139856»     5.0
sessionId4»  4582317»      1.0

edited Jul 19 '16 at 10:19

answered Jul 19 '16 at 09:49

martineau

119,623
25
170
301

thanks for your help but if have error : ` 'str' does not support the buffer interface` , its error when `sessionId` is `str` – Paldro Jul 19 '16 at 10:02
`sessionId` should be a string since the keys of the `users_item` dictionary are strings—so I don't understand the error. – martineau Jul 19 '16 at 10:08
in python 3, change `wb` to `w` will solved error. And it'll work ok ! – Paldro Jul 19 '16 at 10:15
You should have tagged your question as Python 3 (as I just did for you). – martineau Jul 19 '16 at 10:18
sorry for that ! I'll careful next time i as question. – Paldro Jul 19 '16 at 10:25

From Nested Dictionary to CSV File

4 Answers4

Linked