5

I have nested dictionary (with length > 70.000):

users_item = {
    "sessionId1": {
        "12345645647": 1.0, 
        "9798654": 5.0 

    },         
    "sessionId2":{
        "3445657657": 1.0

    },
    "sessionId3": {
        "87967976": 5.0, 
        "35325626436": 1.0, 
        "126789435": 1.0, 
        "72139856": 5.0      
    },
    "sessionId4": {
        "4582317": 1.0         
    }
......
}

I want create CSV file from my nested dictionary, my result will look like:

sessionId1 item rating
sessionId1 item rating
sessionId2 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
.......

I found this post: Convert Nested Dictionary to CSV Table

It's similar to my question but it's not working when I try all answers, pandas library run out of memory

How I can make CSV file with my data?

Community
  • 1
  • 1
Paldro
  • 71
  • 1
  • 2
  • 7
  • @Ev.Kounis , sorry for question not clear. I edit my question ! – Paldro Jul 19 '16 at 09:20
  • if instead of `item` and `rating` you had the actual values it would be clear from the beginning. – Ma0 Jul 19 '16 at 09:23
  • @ĐứcPhan did you try and adapt the accepted answer from your link (it does not use `panda` but `csv.DictWriter`) ? – Frodon Jul 19 '16 at 09:24
  • even a simple csv writer should work if your nested dict is only 1-depth. That is, loop for the outer dict `key`, `value` and perform another loop in inner dict with `k`, `v`, then write to row for `writer.writerow([key, k, v])` – Anzel Jul 19 '16 at 09:25
  • take a look at this: https://docs.python.org/3/library/csv.html#csv.DictWriter – Ma0 Jul 19 '16 at 09:26
  • I tried both answer : The first answer its so error : `rows = [a]+[[q]+[user_item[p].get(q, '-') for p in a[1:]] for q in x] TypeError: 'dict_keys' object is not subscriptable The seconde answer so error : `writer = csv.DictWriter(outf, [" "] + user_item.keys()) TypeError: can only concatenate list (not "dict_keys") to list` – Paldro Jul 19 '16 at 09:30

4 Answers4

1

Just loop through the dictionary and use the Python csv writer to write to the csv file.

with open('output.csv', 'w') as csv_file:
    csvwriter = csv.writer(csv_file, delimiter='\t')
    for session in users_item:
        for item in users_item[session]:
            csvwriter.writerow([session, item, users_item[session][item]])
mowcow
  • 81
  • 4
  • you should know `writerow` takes exactly one argument but in your code you given 3. – Paldro Jul 19 '16 at 09:44
  • @Đức Phan Sorry, forgot the outer brackets, fixed now. – mowcow Jul 19 '16 at 09:46
  • this is error which your code : `writer.writerow([session, item, user_item[session][item]]) TypeError: 'str' does not support the buffer interface` – Paldro Jul 19 '16 at 09:50
  • Ah I'm so used to using 'wb' when opening files in python 2. For python 3 use 'w' instead when opening the csv file. Code changed again. – mowcow Jul 19 '16 at 09:56
1
for session, ratings in users_item.items():
    for rating, value in ratings.items():
        print("{} {}".format(session, value))

Output:

sessionId3 5.0
sessionId3 1.0
sessionId3 5.0
sessionId3 1.0
sessionId1 5.0
sessionId1 1.0
sessionId4 1.0
sessionId2 1.0

Note that a dict (user_items) has no order. So unless you specify the order of rows using some other way, the ouput will be in the order the dict uses internally.

Edit: This approach has no problems with a file containing 70k entries.

Edit: If you want to write to a CSV file, use the csv module or just pipe the output to a file.

0

Assuming you want each session as a row, the number of columns for every row will be the total number of unique keys in all session dicts. Based on the data you've given, I'm guessing the number of unique keys are astronomical.

That is why you're running into memory issues with the solution given in this discussion. It's simply too much data to hold in memory at one time.

Your only option if my assumptions are correct are to divide and conquer. Break the data into smaller chunks and write them to a file in csv format. Then merge the csv files at the end.

Community
  • 1
  • 1
Autonomy
  • 21
  • 2
0

If you iteratively write the file, there should be no memory issues:

import csv

users_item = {
    "sessionId1": {
        "12345645647": 1.0,
        "9798654": 5.0

    },
    "sessionId2":{
        "3445657657": 1.0

    },
    "sessionId3": {
        "87967976": 5.0,
        "35325626436": 1.0,
        "126789435": 1.0,
        "72139856": 5.0
    },
    "sessionId4": {
        "4582317": 1.0
    }
}

with open('nested_dict.csv', 'w') as output:
    writer = csv.writer(output, delimiter='\t')
    for sessionId in sorted(users_item):
        ratings = users_item[sessionId]
        for item in ratings:
            writer.writerow([sessionId, item, ratings[item]])

Resulting contents of output file (where » represents a tab characters):

sessionId1»  12345645647»  1.0
sessionId1»  9798654»      5.0
sessionId2»  3445657657»   1.0
sessionId3»  126789435»    1.0
sessionId3»  87967976»     5.0
sessionId3»  35325626436»  1.0
sessionId3»  72139856»     5.0
sessionId4»  4582317»      1.0
martineau
  • 119,623
  • 25
  • 170
  • 301