How to convert a csv-file to a dictionnary of lists with python?

Question

I'm trying to have this kind of result :

Output_Screenshot

Here is the csv-file :

OsmID,NewName,IdLocal

1020287758,NN1,Id0001

1021229973,NN2,Id0002

1025409497,NN3,Id0003

I'm using the code below:

import csv

input = r'C:\Users\_M92\csvFiles\csv0001.csv'

fileRead = open(input, 'r')

with open(input, 'r') as csv:
    headerLine = fileRead.readline()
    header = headerLine.split(",")  
    #print(header)
    nameIndex = header.index("OsmID")    
    output = {}
    for line in fileRead.readlines():
        values = line.split(",")
        output[values[nameIndex]] = values

print(output)

And it results in the following error:

File "c:\Users\_M92\Scripts\CsvToDict.py", 
    
    line 19, in <module>
        nameIndex = header.index("OsmID")

  ValueError: 'OsmID' is not in list

just curious – why are you using this approach? with pandas, you could do something like: `df = pd.read_csv('C:\Users\_M92\csvFiles\csv0001.csv')` and then `output = df.to_dict(orient='list')` — Derek O, Jul 11 '22 at 16:07
Your input and output screenshots are too small to be legible on my desktop (forget about mobile screens). Please consider editing your question to add your input and output _as [formatted](/help/formatting) text_. See https://meta.stackoverflow.com/q/285551 — Pranav Hosangadi, Jul 11 '22 at 16:08
Hi @DerekO, actually, it's a python code I'm using in ArcGIS. Unfortunately, I can't use pandas. — Timeless, Jul 11 '22 at 16:09
You should make it a habit to share the _full_ traceback of your error, since it contains useful information that is not contained in the error message you have shared. In this case, it seems obvious that the error comes from the `nameIndex = header.index(...)` line. Have you tried any [debugging](//ericlippert.com/2014/03/05/how-to-debug-small-programs/) to figure out why that might be the case? What did `print(header)` show you? Perhaps there's some whitespace around `OsmID` that results in `"OsmID" in header` to be `False`? We can't tell because it's not obvious from your screenshot — Pranav Hosangadi, Jul 11 '22 at 16:11
@M92_ can you attach the text from the actual .csv file – this would allow us to run your code and help you debug (you can just copy and paste it, commas and all, as formatted text). currently if we want to debug your code and reproduce your error, we would have to attempt to recreate your file from your screenshot of it — Derek O, Jul 11 '22 at 16:12
@M92_ no, you imported the `csv` module, then overwrote it with the file handle that you opened in line 7 — Pranav Hosangadi, Jul 11 '22 at 16:22

Pranav Hosangadi · Accepted Answer · 2022-07-11T16:53:15.623

Instead of manually splitting each line by commas, use the CSV module that you've imported. This module contains a DictReader class that will yield dictionaries for each row. Then, you just need to add this to your output dictionary.

# Create an empty dictionary
# We will add keys to this as needed
output = {}
# Keep track of number of rows, so we can add an empty column if needed
row_count = 0

# This function adds a row to the output dictionary
def add_row(row_dict):
    global row_count # Need to declare this as global because we're assigning to the variable in this function
    if not row_dict: return # If row is empty, do nothing
    for k, v in row_dict.items():
        # Loop over all key-value pairs in the row to add
        if k not in output: # If the output doesn't contain this column, create a blank column
            output[k] = [None] * row_count 
   
        output[k].append(v) # Append the value to the correct column in output

    row_count += 1 

input_file = r'C:\Users\_M92\csvFiles\csv0001.csv'
with open(input_file, 'r') as fh:
    reader = csv.DictReader(fh) # Create a DictReader
    for row in reader:
        add_row(row) # Add every row to the output

This gives the following output:

{'OsmID': ['1020287758', '1021229973', '1025409497'], 
 'NewName': ['NN1', 'NN2', 'NN3'], 
 'IdLocal': ['Id0001', 'Id0002', 'Id0003']}

Note: I removed the blank lines in the input csv you provided, but it doesn't make a difference to the program, since a blank line will yield an empty dictionary from DictReader, and add_row doesn't do anything with empty dicts

Note 2: You could discard the row_count variable if you dynamically count the number of rows like so:

def add_row(row_dict):
    row_count = 0
    for first_key, first_val in output.items():
        row_count = len(first_val)
        break # We can just break out here because all keys should have the same number of values
    
    # Create keys that do not yet exist in output but do exist in the new row
    existing_keys = set(output.keys())
    new_row_keys = set(row_dict.keys())
    keys_to_create = new_row_keys - existing_keys 
    for key in keys_to_create:
        output[key] = [None] * row_count

    # Append to each column in output
    for key in output:
        output[key].append(row_dict.get(key, None)) # If the key doesn't exist in the current row, append None

Thank you so much, @Pranav Hosangadi. It worked for me as well. I got (almost) the same output in Visual Studio Code. I don't know why there is some weird characters before the OsmID key : {'ï»¿OsmID': ['1020287758', '1021229973', '1025409497'], 'NewName': ['NN1', 'NN2', 'NN3'], 'IdLocal': ['Id0001', 'Id0002', 'Id0003']} — Timeless, Jul 11 '22 at 16:34
That probably has to do with the encoding of your input file. Use the `encoding='...'` argument in the `open()` function to read it using the correct encoding https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files @M92_ — Pranav Hosangadi, Jul 11 '22 at 16:37
I added the argument encoding='utf-8' to the open() function and now I'm getting '\ufeffOsmID' instead of 'ï»¿OsmID'. I'll try to understand why. Thank you @Pranav Hosangadi. — Timeless, Jul 11 '22 at 16:43
@M92_ you're very welcome! There are different encodings. Here's a link you can use to try to figure out the encoding of your file: https://stackoverflow.com/questions/4255305/how-to-determine-encoding-table-of-a-text-file — Pranav Hosangadi, Jul 11 '22 at 16:47
@M92_ this is also probably the reason you were getting the `ValueError`: If your first column is read as `'ï»¿OsmID'`, you can see why `'OsmID'` can't be found in the list, yeah? — Pranav Hosangadi, Jul 11 '22 at 16:55
Amazing! I figured out the encoding of my csv file. It's UTF-8 codec with BOM signature. So I added the argument `encoding='utf-8-sig"` in your `open()` function and the problem was solved. The enconding has also corrected my code and I'm not getting the `ValueError` anymore. — Timeless, Jul 11 '22 at 17:03

score -1 · Answer 2 · answered Jul 11 '22 at 16:08

-1

You could use Pandas

import pandas as pd

f = r'C:\Users\_M92\csvFiles\csv0001.csv'
df = pd.read_csv(f).to_dict('list')

answered Jul 11 '22 at 16:08

Qdr

703
5
13

2

@M92_ mentioned in a comment that it's python code they're running in ArcGIS so they can't use pandas – Derek O Jul 11 '22 at 16:10
1

i didn't downvote your answer, but it's debatable about whether it's useful for not. anyone programming in arcgis who has a similar question to @M92_ won't be able to use your answer either. however i suppose it's possible that someone wanting to convert a csv to a dictionary lists of lists will find this question and find your answer – but they would be much more likely to find [this answer](https://stackoverflow.com/questions/23474507/most-pythonic-way-to-read-csv-values-into-dict-of-lists) before they find your answer here because of the number of views – Derek O Jul 11 '22 at 17:18

Vincent Bénet · Answer 3 · 2022-07-11T16:26:43.357

Try to go from this snippet for you. This is the 'From scratch' method. Please use a lib to do it properly!:

import os

input_path = r'test.csv'

header_line = 0
sep_csv_line = "\n\n"
sep_csv_column = ","

with open(os.path.join(os.path.dirname(__file__), input_path), 'r') as csv:
    content = csv.read()
    split = content.split(sep_csv_line)
    columns = split[header_line].split(sep_csv_column)
    print(f"{columns = }")
    output = {}
    for column in columns:
        output[column] = []
    
    for line in split[header_line+1:]:
        print(f"{line = }")
        elements = line.split(sep_csv_column)
        print(f"{elements = }")
        for i, column in enumerate(columns):
            element = elements[i]
            print(f"{element = }")
            output[column].append(element)

print(f"{output = }")
print(f"{output['OsmID'] = }")

Here is the output console:

columns = ['OsmID', 'NewName', 'IdLocal']
line = '1020287758,NN1,Id0001'
elements = ['1020287758', 'NN1', 'Id0001']
element = '1020287758'
element = 'NN1'
element = 'Id0001'
line = '1021229973,NN2,Id0002'
elements = ['1021229973', 'NN2', 'Id0002']
element = '1021229973'
element = 'NN2'
element = 'Id0002'
line = '1025409497,NN3,Id0003'
elements = ['1025409497', 'NN3', 'Id0003']
element = '1025409497'
element = 'NN3'
element = 'Id0003'
output = {'OsmID': ['1020287758', '1021229973', '1025409497'], 'NewName': ['NN1', 'NN2', 'NN3'], 'IdLocal': ['Id0001', 'Id0002', 'Id0003']}
output['OsmID'] = ['1020287758', '1021229973', '1025409497']

In the screenshot you have. I have changed the code it is working with `header_line = 0` and `sep_csv_line = "\n\n"` — Vincent Bénet, Jul 11 '22 at 16:22

How to convert a csv-file to a dictionnary of lists with python?

3 Answers3