0

I have a csv file with something like that:

oneWordString,"string with spaces","string
  with
  some, lines",anotherString

This string is getting from a program so i can´t change the format.

I am expecting to get something like this:

['oneWordString','string with spaces','string
  with
  some, lines','anotherString']
  • Are you using the `String split()` method to process the csv data? It's probably easier to use the `pandas` library for this, pretty sure it supports for quoted string values that contain a carriage return. – BdR Sep 06 '21 at 13:39

3 Answers3

0

CSV files often include both quoted and unquoted strings, and this is allowed by a common standard for the format of CSV files. The best way to deal with both standard and (within reason) non-standard CSV files is to use Python's csv module, which deals well with most aspects of the file format. Don't try to split up the strings using split or regex - the csv module does this more reliably and easily.

import csv

# Make example file
with open("example.csv", "w") as f:
    f.write("""oneWordString,"string with spaces","string
      with
      some, lines",anotherString""")

# Read it back
with open("example.csv") as f:
    result = list(csv.reader(f))
Stuart
  • 9,597
  • 1
  • 21
  • 30
-1

I modified an existing regex found on stackoverflow to solve your problem. I also believes it yields your desired outcome.

import re

str_1 = """oneWordString, 'string with spaces','string
  with
  some, lines',anotherString"""

example_1 = re.compile(r'''((?:[^,"']|"[^"]*"|'[^']*')+)''')
print(example_1.split(str_1)[1::2])

['oneWordString', " 'string with spaces'", "'string\n  with\n  some, lines'", 'anotherString']

str_2 = '''oneWordString, "string with spaces","string
  with
  some, lines",anotherString'''

example_2 = re.compile(r'''((?:[^,"']|"[^"]*"|'[^']*')+)''')
print(example_2.split(str_2)[1::2])

['oneWordString', ' "string with spaces"', '"string\n  with\n  some, lines"', 'anotherString']

split but ignore seperators

FerdiS
  • 40
  • 1
  • 5
-1

You could use Python's string replace. I.e.

raw_string = 'I'm string with \n new lines'

striped_new_lines = raw_string.replace('\n', '')
Nihad
  • 50
  • 7