CSV - reading problem using custom python script

Question

I'm writing a custom script whose first task is to extract a csv's data into a python dictionary. There's some weird behaviour with a variable though: When executing the script below, instead of subsequent inputs, I get "Squeezed text (77 lines)" as output. If I inspect that, I get a white empty screen, so there seems to be nothing. Totally don't get what's happening..

My script:

import os
import io

separator = ";"

source_data_folder = os.path.realpath( __file__ ).replace( "extraction.py", "source_data" )

for source_file in os.listdir( source_data_folder ):

    iterated_source_file = io.open( source_data_folder + "/" + source_file, encoding='windows-1252' )

    source_data = {}

    source_data_key_indexes = {}

    line_counter = 0

    for iterated_line in iterated_source_file:

        iterated_lines_data = iterated_line.split( "" + separator + "" )

        column_counter = 0

        if line_counter == 0:

            for iterated_lines_field in iterated_lines_data:

                source_data[iterated_lines_field] = []

                source_data_key_indexes[column_counter] = iterated_lines_field

                column_counter += 1

        else:

            for iterated_lines_field in iterated_lines_data:
                source_data[source_data_key_indexes[column_counter]].append( iterated_lines_field )

                column_counter += 1

        line_counter += 1

    iterated_source_file.close()

    for column_index in source_data_key_indexes:
        input( "Shall the column '" + source_data_key_indexes[column_index] + '"be exported? (y/n)" )

When I put this part:

for column_index in source_data_key_indexes:
        input( "Shall the column '" + source_data_key_indexes[column_index] + '"be exported? (y/n)" )

Out of the initial for loop, without any indentation, it however works; but I need to call it in the first for loop. I could may due this with a callback, but why is this actually happening??

I'm using Python v. 3.7.3 and am executing the script via the Python Shell v. 3.7.3.

content of a sample CSV file, placed in the source_data folder, which is placed in the same location as the "extraction.py" file, holding the code above:

first;second;third;fourth
this;is;the;1st
this;is;the;2nd

This CSV - file was obtained by creating the according table in a new Microsoft Office Excel datasheet, with the according three lines + four columns, then saving the file as utf-8 csv file via "save as..." and selecting the utf-8 csv file type.

Note: I noticed that when I add the line

print( iterated_line )

below the line line_counter == 0: of my code, I interestingly get the "Squeezed text (77 lines)" again, followed by the visible content of the first line as a simple string. This is only true for the table header line (only the very first one); for the others only the line content is outputted. Interestingly, this happens for any csv - file I create in the above - mentioned way; no matter the amount of rows, columns, or their content. So is this actually some formatting issue with Python + Ms Excel?

Please provide a [mre]. We cannot help you if we aren't able to run your code (in this case because we do not have access to your files). Those comments on every line don't help with your code's readability. Remove those too, unless you think the line isn't self-explanatory and absolutely needs a comment. Also, FYI, [csv.DictReader](https://docs.python.org/3/library/csv.html#csv.DictReader) exists. — Pranav Hosangadi, Jun 01 '21 at 16:23
You may use whatever csv you save from a Microsoft Excel data sheet, simply via "save as..." + saving as utf-8 csv file. Result is the file like the text added at the end of my question. — DevelJoe, Jun 01 '21 at 16:35

score 1 · Answer 1 · answered Jun 01 '21 at 17:02

1

import os
import csv

source_data_folder = os.path.realpath( __file__ ).replace("extraction.py", "source_data")

for filename in os.listdir(source_data_folder):
    with open(filename, encoding='windows-1252') as fp:
        reader = csv.DictReader(fp, delimiter=';')
        table = list(reader)
        # Convert list of dicts to dict of lists
        table = {key: [item[key] for item in table] for key in table[0]}
        print(table)

answered Jun 01 '21 at 17:02

9769953

10,344
3
26
37

Thanks, but using another python module doesn't really answer my question.. – DevelJoe Jun 01 '21 at 17:16
`csv` is a built-in module, and should be used when reading a CSV file, instead of writing your own parser. – 9769953 Jun 01 '21 at 21:14
Your solution uses a module which has nothing to do with my code, built-in or not, and results in the exact same errors + encoding problems as elaborated in my question & answer, so it seems to do more or less the same. Which is why I don't really consider this to be a solution, but thanks for your help anyway. – DevelJoe Jun 02 '21 at 07:37

DevelJoe · Accepted Answer · 2021-06-01T17:21:29.527

I found the problem, weirdly thanks to this. The problem is that os.listdir() contained that .DS_store - file as first element, which is where the buggy first iteration originates from, so replace:

for source_file in os.listdir( source_data_folder ):

with

# remove the .DS_Store default file which is auto-added in Mac OS
myfiles = os.listdir( source_data_folder )
del myfiles[0]

# iterate through all the source files present in the source folder
for source_file in myfiles:

And the only problem now is that I have the string

\ufeff

At the very start of the very first line only. To not consider it, according to this, use the utf-8-sig encoding instead of utf-8, indeed worked (the encoding change tells the engine to "omit the BOM in the result").

CSV - reading problem using custom python script

2 Answers2