1

Hi I am trying to use csv library to convert my CSV file into a new one.

The code that I wrote is the following:

import csv
import re

file_read=r'C:\Users\Comarch\Desktop\Test.csv'
file_write=r'C:\Users\Comarch\Desktop\Test_new.csv'

def find_txt_in_parentheses(cell_txt):
    pattern = r'\(.+\)'
    return set(re.findall(pattern, cell_txt))

with open(file_write, 'w', encoding='utf-8-sig') as file_w:
    csv_writer = csv.writer(file_w, lineterminator="\n")
    with open(file_read, 'r',encoding='utf-8-sig') as file_r:
        csv_reader = csv.reader(file_r)
        for row in csv_reader:
            cell_txt = row[0]
            txt_in_parentheses = find_txt_in_parentheses(cell_txt)
            if len(txt_in_parentheses) == 1:
                txt_in_parentheses = txt_in_parentheses.pop()
                cell_txt_new = cell_txt.replace(' ' + txt_in_parentheses,'')
                cell_txt_new = txt_in_parentheses + '\n' + cell_txt_new
                row[0] = cell_txt_new
            csv_writer.writerow(row)

The only problem is that in the resulting file (Test_new.csv file), I have CRLF instead of LF. Here is a sample image of:

  • read file on the left
  • write file on the right:

enter image description here

And as a result when I copy the csv column into Google docs Excel file I am getting a blank line after each row with CRLF.

enter image description here

Is it possible to write my code with the use of csv library so that LF is left inside a cell instead of CRLF.

John Snow
  • 107
  • 1
  • 10
  • You *really* should not mix `CRLF` and `LF` in text files. Use one, or the other. Your input file is already practically broken. If the system you are creating this file for can not deal with `CRLF` for some reason, your best bet is probably to use `LF` all the way. – Tomalak Dec 21 '21 at 13:30
  • @Tomalak The problem is I did not mix it. This is what I got from Microsoft Excel after saving the file as CSV file. – John Snow Dec 21 '21 at 13:56
  • Interesting, I've never known! But I can reproduce it, Excel saves `LF` only for me as well. What trouble does it cause when you leave it at `CRLF` after your change? (Excel itself is not confused by it, it opens both files just fine for me.) – Tomalak Dec 21 '21 at 16:41
  • @Tomalak I have edited the question and added a screen of what I got in Google Docs Excel. A blank line in a cell after each line ending with `CRLF` :) – John Snow Dec 22 '21 at 08:48

2 Answers2

3

From the documentation of csv.reader

If csvfile is a file object, it should be opened with newline=''1
[...]

Footnotes

1(1,2) If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.

This is precisely the issue you're seeing. So...

with open(file_read, 'r', encoding='utf-8-sig', newline='') as file_r, \
     open(file_write, 'w', encoding='utf-8-sig', newline='') as file_w:
     
    csv_reader = csv.reader(file_r, dialect='excel')
    csv_writer = csv.writer(file_w, dialect='excel')

    # ...
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • Thanks a lot! I was on this site but I did not go to the very bottom of the page and this is why I missed this part. Learned a new lesson: Read the docs until the end :) – John Snow Dec 22 '21 at 14:11
  • 1
    @JohnSnow I've simply gotten into the habit of always opening CSV files with `newline=''`, but I've never fully realized *why*. Now I know. :) – Tomalak Dec 22 '21 at 14:41
  • So we have helped each other :) This is good :) – John Snow Dec 22 '21 at 14:45
0

You are on Windows, and you open the file with mode 'w' -- which gives you windows style line endings. Using mode 'wb' should give you the preferred behaviour.

Klamer Schutte
  • 1,063
  • 9
  • 18
  • `wb` means that the OP has to take care of text encoding manually, so just switching to `wb` and not doing anything else is asking for trouble at another point. – Tomalak Dec 21 '21 at 13:31