-1

I am trying to open a CSV file after I create it with python. My goal is to be able to read back the file without editing it and my problem has been that I cannot get the delimiter to work. My file is created with python csv writer and then I am attempting to use the reader to read the data from the file. This is where I am getting stuck. My CSV file is saved in the same location that my python program is saved in thus I know it is not an access issue. My file is created with special character delimiter I am using Semicolons; because the raw data already contains commas,, colons;, plus signs+, ampersands&, periods., and possibly underscores_ and/or dashes-. This is the code that I am using to read my CSV file:

with open('Cool.csv') as csv_file:
  csv_reader = csv.reader(csv_file, delimiter=';', dialect=csv.excel_tab)
  for row in csv_reader:
    print row[0]

csv_file.close() 

Now this is my csv file (Cool.csv):

"Sat, 20 Apr 2019 00:17:05 +0000;Need to go to store;Eggs & Milk are needed ;Store: Grocery;Full Name: Safeway;Email: safewayiscool@gmail.com;Safeway <safewayiscool@gmail.com>, ;"
"Tue, 5 Mar 2019 05:54:24 +0000;Need to buy ham;Green eggs and Ham are needed for dinner ;Username: Dr.Seuss;Full Name: Theodor Seuss Geisel;Email: greeneggs+ham@seuss.com;"

So I would expect my output to be the following when I run the code:

Sat, 20 Apr 2019 00:17:05 +0000
Tue, 5 Mar 2019 05:54:24 +0000

I either get a null error of some kind or it will print out the entire line. How can I get it to separate out data into what I want to have defined the columns delimited by the ;?

I am not sure if the issue is that I am trying to use the semicolon or if it is something else. If it is just the semicolon I could change it if necessary but many other characters are already taken in the incoming data.

Also please do not suggest I simply just read it in from the original file. It is a massive file that has a lot of other data and I want to trim it before then executing with this second program.

UPDATE: This is the code that builds the file:

with open('Cool.csv', 'w') as csvFile:
    writer = csv.writer(csvFile, delimiter=';')
    for m in file: 
        message = m['msg']
        message2 = message.replace('\r\n\r\n', ';')
        message3 = message2.replace('\r\n', ';')
        entry = m['date'] + ";" + m['subject'] + ";" + message3
        list = []
        list.append(entry)
        writer.writerow(list)
csvFile.close()
Ben
  • 251
  • 1
  • 8
  • 23
  • "I either get a null error of some kind or it will print out the entire line" — Well which is it? If you get an error, what does it say? If not, exactly what output do you get? – khelwood Apr 30 '19 at 21:39
  • The problem is with having quotes in your CSV and having `quoting` on by the parser. You should really consider geting rid of the quotes. – hingev Apr 30 '19 at 21:51
  • 1
    It looks like the primary problem may be that the csv file you are creating isn't formatted properly—what's wrong it that each line of the file shouldn't be quoted as shown in your question. I suggest you [edit] your question and add the code that's creating it. – martineau Apr 30 '19 at 22:35
  • 1
    What @martineau said. You've created the file incorrectly if it is quoting the whole line like that. The defaults for `csv.reader` and `csv.writer` will work regardless of the delimiter. You can use the default comma delimiter and the fields with commas will be quoted. – Mark Tolonen May 01 '19 at 01:07
  • @khelwood I apologize for the ambiguous error, at first I was getting a Null error and didn't realize that it was due to a spelling mistake. I fixed it and now I am only getting the full print out. – Ben May 01 '19 at 05:20
  • You’ve created a line with semicolons already added, then added that line as a single column to `list`. You need to put the columns in as separate entries in the list and let `csv.writerow` generate the line with proper quoting and delimiters. – Mark Tolonen May 01 '19 at 07:23
  • @MarkTolonen Thanks for the explanation. Wont the way in which I created the `message3` line also create the same issue? How would you suggest that I break that into the many columns that I want it broken out into? Do I need to be replacing the semicolons with parenthesis instead? Also would you make 3 lists and add them together? `entry1 = m['date'] list1.append(entry1) entry2 = m['subject'] list2.append(entry2) entry3= message3 list3.append(entry3)` and then actually add it to the CSV with the following: `writer.writerow([list1,list2,list3])` – Ben May 01 '19 at 15:57
  • @Ben You haven't shown what the original input data looks like, but `writerow` doesn't take a list of lists, it takes a list of the column data. What is `file` in `for m in file:`? `m` seems to be a `dict` of some kind. On a guess, I'd say you need something like `writer.writerow([m['date'],m['subject'],...])`. – Mark Tolonen May 01 '19 at 16:50
  • @MarkTolonen The data is as follows `print m['date']` returns `>>Sat, 20 Apr 2019 00:17:05 +0000` and the `print m['subject'] >>Need to go to store` and lastly `print message3 >>Eggs & Milk are needed ;Store: Grocery;Full Name: Safeway;Email: safewayiscool@gmail.com;Safeway , ;` I hope this is helpful also `file` is a uniquely generated file that my old validation systems spits out with lots of extra data that I then trim down with the other parts of my code. Sorry I cannot really share more on what that file type is or looks like. – Ben May 03 '19 at 03:05

2 Answers2

1

It looks like the file was created incorrectly. The sample data provided shows the whole line double-quoted, which treats it as one long single column. Here's correct code to write and read and semicolon-delimited file:

import csv

with open('Cool.csv','w',newline='',encoding='utf-8-sig') as csv_file:
    csv_writer = csv.writer(csv_file,delimiter=';')
    csv_writer.writerow(['data,data','data;data','data+-":_'])

with open('Cool.csv','r',newline='',encoding='utf-8-sig') as csv_file:
    csv_reader = csv.reader(csv_file,delimiter=';')
    for row in csv_reader:
        print(row)

Output (matches data written):

['data,data', 'data;data', 'data+-":_']

Cool.csv:

data,data;"data;data";"data+-"":_"

Notes:

  • utf-8-sig is the most compatible encoding with Excel. Any Unicode character you put in the file will work and look correct when the CSV is opened in Excel.
  • newline='' is required per the csv documentation. The csv module handles its own newlines per the dialect used (default 'excel').
  • ; delimiter is not needed. The default , will work. Note how the second entry has a semicolon, so the field was quoted. The first field with comma would have been quoted instead if the delimiter was a comma and it would still work.
  • csv_writer.writerow takes a sequence containing the column data.
  • csv_reader returns each row as a list of the column data.
  • A column in the .CSV is double-quoted if it contains the delimiter, and quotes are doubled if present in the data to escape them as well. Note the third field has a double quote.
  • csv_writer.close() and csv_reader.close() are not needed if using with.
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • Well-written and comprehensive answer, IMO. However the OP is apparently using Python 2.x based on the `print` statement in their question—which, admittedly, I added when I noticed it—and at least some of what you wrote only applies to Python 3.x. – martineau May 01 '19 at 04:08
0

RTFM.

From help (csv)

    DIALECT REGISTRATION:

    Readers and writers support a dialect argument, which is a convenient
    handle on a group of settings.  When the dialect argument is a string,
    it identifies one of the dialects previously registered with the module.
    If it is a class or instance, the attributes of the argument are used as
    the settings for the reader or writer:

        class excel:
            delimiter = ','
            quotechar = '"'
            escapechar = None
            doublequote = True
            skipinitialspace = False
            lineterminator = '\r\n'
            quoting = QUOTE_MINIMAL

And you use dialect=csv.excel_tab.

You effectively overwrite your delimiter. Just don't use the dialect option.

Sidenote: with handles closing of the file handle for you. Read here

Second sidenote: The whole line of your CSV file is in double quotes. Either get rid of them, or disable the quoting. i.e.

with open('b.txt') as csv_file:
  csv_reader = csv.reader(csv_file, delimiter=';', quoting=csv.QUOTE_NONE)
  for row in csv_reader:
    print (row[0])
hingev
  • 254
  • 1
  • 2
  • 7