0

I would like to create a dict. which would include several csv files, each file having a year as a name. All files are in a folder. As far as I can see, each has the same number of columns, and the separator is comma. Unfortunately, my code is not working and I can't figure out why. I have tried to add engine , sep, delimiter, header.. nothing fixed the problem. In the end,I also would like to make one big DataFrame from these files without losing any information. Can someone help me to solve the problem? Thanks.

I have the following code:

df_dict = {}

for dirname, _, filenames in os.walk(' used my path the the folder'):

    for filename in filenames:
        print(os.path.join(dirname, filename))
        df_filename = pd.read_csv(os.path.join(dirname, filename), engine='python' )
        df_filename['Year'] = filename[:-4]
        df_dict[filename[:-4]] = df_filename

The error message: "ParserError: Expected 1 fields in line 4, saw 2"

nchrista
  • 1
  • 1
  • 1
    The error is pretty clear. Line 4 of that file has 2 columns, but the header line has only one column name. – Barmar Sep 26 '22 at 19:44
  • 1
    Please post the first 5 lines of the file that's getting the error. – Barmar Sep 26 '22 at 19:44
  • Hi, all the files have the same number of columns. – nchrista Sep 26 '22 at 21:03
  • And do they match the number of columns in the header? – Barmar Sep 26 '22 at 21:05
  • I've linked to a question with 50 answers. One of them probably applies to your situation. – Barmar Sep 26 '22 at 21:07
  • yes, they do. They are the same report just for different years. Overall rank,Country or region,Score,GDP per capita,Social support,Healthy life expectancy,Freedom to make life choices,Generosity,Perceptions of corruption 1,Finland,7.632,1.305,1.592,0.874,0.681,0.202,0.393 2,Norway,7.594,1.456,1.582,0.861,0.686,0.286,0.340 3,Denmark,7.555,1.351,1.590,0.868,0.683,0.284,0.408 4,Iceland,7.495,1.343,1.644,0.914,0.677,0.353,0.138 5,Switzerland,7.487,1.420,1.549,0.927,0.660,0.256,0.357 6,Netherlands,7.441,1.361,1.488,0.878,0.638,0.333,0.295 – nchrista Sep 26 '22 at 21:13
  • Something is not right. read_csv thinks the header has only 1 field, and it thinks line 4 has 2 fields. But your data shows 9 fields. – Barmar Sep 26 '22 at 21:16
  • I know. That is the reason I do not understand. I checked many posts with similar issue, but couldn't solve my problem. – nchrista Sep 26 '22 at 21:21
  • Check the file with a hex dump, there may be invisible characters somewhere that's confusing it. – Barmar Sep 26 '22 at 21:22

0 Answers0