-1

I have a CSV file, and I want to convert it in to dict format. In the CSV there are paths which are non unique.

my_csv.csv:

folder/1/img/file/1.mp3/4.jpg
folder/1/img/file/1.mp3/8.jpg
folder/3/img/file/3.mp3/1.jpg
folder/3/img/file/3.mp3/5.jpg
folder/6/img/file/6.mp3/6.jpg
folder/6/img/file/6.mp3/8.jpg
folder/7/img/file/7.mp3/9.jpg

Expected output:

expected_output = {
  'folder/1/img/file/1.mp3': ['4.jpg','8.jpg'],
  'folder/3/img/file/3.mp3': ['1.jpg','5.jpg'],
  'folder/6/img/file/6.mp3': ['6.jpg','8.jpg'],
  'folder/7/img/file/7.mp3': ['9.jpg']
}

I have tried this,but it could only add one image per folder.

import csv

my_dict = {}

with open("my_csv.csv", 'r') as file:
    csvreader = csv.reader(file)
    for row in csvreader:
        for rw in row:  
            head_tail = os.path.split(rw)
            img_path = (head_tail[0])
            img_name = (head_tail[1])
            my_dict[img_path]=img_name
print(my_dict)
Gino Mempin
  • 25,369
  • 29
  • 96
  • 135
Beyond
  • 21
  • 3

1 Answers1

0

This is where Python's collections.defaultdict becomes useful.

The problem with your loop is that you needed to collect the values for the same key into a list, but you kept resetting the value of my_dict[img_path] to just the most recently parsed img_name. With a defaultdict, if the key doesn't exist yet, it can automatically initialize a default value, which in this case, you can make it an empty list. And then just keep .append-ing to that list when you encounter the same key.

Demo:

>>> from collections import defaultdict
>>> 
>>> dd = defaultdict(list)
>>> dict(dd)
{}
>>> dd["folder/1/img/file/1.mp3"].append("4.jpg")
>>> dd["folder/1/img/file/1.mp3"].append("8.jpg")
>>> 
>>> dd["folder/1/img/file/6.mp3"].append("6.jpg")
>>> 
>>> dict(dd)
{'folder/1/img/file/1.mp3': ['4.jpg', '8.jpg'], 'folder/1/img/file/6.mp3': ['6.jpg']}

Putting that into your code:

import csv
from collections import defaultdict
from pprint import pprint

my_dict = defaultdict(list)

with open("my_csv.csv", 'r') as file:
    csvreader = csv.reader(file)
    for row in csvreader:
        for rw in row:
            img_path, img_name = rw.rsplit("/", maxsplit=1)
            my_dict[img_path].append(img_name)

pprint(dict(my_dict))
{'folder/1/img/file/1.mp3': ['4.jpg', '8.jpg'],
 'folder/3/img/file/3.mp3': ['1.jpg', '5.jpg'],
 'folder/6/img/file/6.mp3': ['6.jpg', '8.jpg'],
 'folder/7/img/file/7.mp3': ['9.jpg']}

Note that if you strictly need a regular dict type, you can convert a defaultdict to a dict by calling dict(...) on it. But for most purposes, a defaultdict behaves like a dict.

Notice that I also changed how to parse the img_path and img_name on each line. Since those lines don't look like valid paths anyway and that you are not really using them in any file I/O operations, there is no point using os.path. You can simply use str.rsplit.

Lastly, you say that your input file is a CSV and you are using csv.reader, but the contents aren't really CSV-formatted in the sense that it's basically one input string per line. You can do away with regular iteration over each line:

from collections import defaultdict
from pprint import pprint

my_dict = defaultdict(list)

with open("my_csv.csv", 'r') as file:
    for line in file:
        img_path, img_name = line.rstrip().rsplit("/", maxsplit=1)
        my_dict[img_path].append(img_name)

pprint(dict(my_dict))

...which yields the same result:

{'folder/1/img/file/1.mp3': ['4.jpg', '8.jpg'],
 'folder/3/img/file/3.mp3': ['1.jpg', '5.jpg'],
 'folder/6/img/file/6.mp3': ['6.jpg', '8.jpg'],
 'folder/7/img/file/7.mp3': ['9.jpg']}
Gino Mempin
  • 25,369
  • 29
  • 96
  • 135