1

I am trying to read CSV file which has below data

"27@21","","2725 abc dr"","","Mumbai","IN",""

using below code

with open(file, "r") as csv_file:
    reader = csv.reader(csv_file, delimiter=',')
    for row in reader:
        colValues = list(row)
        print(colValues)

it is giving an output as

['27@21', '', '2725 abc dr",",Mumbai"', 'IN', '']

If you look into the above bold output, it is the combination of three columns of inputs.

I want this output to be same as input given.

Note: I am creating a utility to process any csv file with an unexpected special characters like double quote anywhere in column value and create new file with removing such characters. For this purpose I need this problem to be solved.

Rohit
  • 27
  • 1
  • 8
  • Why are these supposed to be three columns? The values are not separated by your delimiter `','` – timgeb Dec 07 '18 at 12:27
  • 3
    I would suggest that `"2725 abc dr"","","Mumbai"` is inherently unparsable without some rules to determine where the two required tokens start & end, and any suggestions here can't help with that with just a single example. – Alex K. Dec 07 '18 at 12:29
  • 1
    After dr, there is a double double quote, which means it's escaped, so the following commas are still within a quoted sequence. The quoted part onlyends before Mumbai. See [this question](https://stackoverflow.com/questions/17808511/properly-escape-a-double-quote-in-csv) for escaping rules. – molnarm Dec 07 '18 at 12:29
  • Basically it depends on the input. I want every column value should come in list as a separate string. But here input is having 7 columns and output list is only having 4 columns. – Rohit Dec 07 '18 at 12:31
  • Could you guys give some workaround for this? So that input columns value in csv should be same as string in output list. – Rohit Dec 07 '18 at 12:33
  • What is `**'2725 abc dr",",Mumbai"'**` meant to be... those `**` are bugs in the self-proclaimed CSV file. It is not a CSV file. Fix the file so that it becomes an CSV file, then the CSV parser will give you what you want. An entry like `{"key": **value""}` in a JSON file would make it an invalid JSON file, same goes for CSV. There are rules that need to be adhered to. – Daniel F Dec 07 '18 at 12:50
  • Daniel - Someone edited the post. This ** was to keep the text in bold. I am changing it. – Rohit Dec 07 '18 at 12:54
  • Oh, thanks! That will help. OMG, it was me who made that edit, I'm so sorry, my apologies! – Daniel F Dec 07 '18 at 12:55
  • Ok, the same still applies, but the bug in the CSV file is the `""` after the word `drive`. Only a single double-quote should go there. – Daniel F Dec 07 '18 at 12:56
  • @Daniel - This is the requirement. that's why I am trying to create a utility to deal with it. – Rohit Dec 07 '18 at 12:59
  • Does the file contain multiple occurences of this bug? I think you might need to pre-process it and peform some regex based replacements on it before feeding it to the CSV processor. – Daniel F Dec 07 '18 at 13:02
  • Yes. It is there multiple times in a csv file. Could you help me with regex code to find such pattern? – Rohit Dec 07 '18 at 13:11

1 Answers1

0

This is solved finally by adding just one more parameter (quotechar) in reader method like below,

reader = csv.reader(csv_file, delimiter=',', quotechar="'")

Input :  "27@21","","2725 abc dr"","","Mumbai","IN",""

Output : ['"27@21"', '""', '"2725 abc dr""', '""', '"Mumbai"', '"IN"', '""']

This is expected output where number of columns in input csv file is same as output list.

Rohit
  • 27
  • 1
  • 8