0

So I got a folder with a bunch of CSVs and inside these csvs there are simple tables sep=';', the problem comes when in one column of these tables, which contains text, there are lines that start with \n and of course, these rows don't have the same length as the others, because the \n breaks the line. What I'm trying to do is to add the lines that startwith('\n') to the previous line. I'll leave the code below, it doesn't get me an error, but it doesn't solve the problem either. I'm pretty sure there are simpler ways to do these but I not getting the inspiration. =)

with open(path_csvs, 'r',encoding='latin-1') as f:
    lineas = f.readlines()
with open(path_csvs, 'w',encoding='latin-1') as f:
    i = 0
    while i < len(lineas):
        if i < len(lineas)-1 and lineas[i].startswith('\n'):
            f.write(lineas[i].rstrip() + ' ' + lineas[i-1].lstrip())
            i += 2
        else:
            f.write(lineas[i])
            i += 1
Jaime R
  • 1
  • 1
  • Can you provide a *concrete* example of such a CSV file? What escape characters are used? Why can't you just use `pandas.read_csv`? And, unrelatedly, is the encoding *really* Latin-1? – Konrad Rudolph Mar 21 '23 at 12:42
  • Welcome to Stack Overflow. I don't understand: **why is the question tagged `pandas`?** If you are using Pandas, did you try **reading the documentation**, to see the functionality that Pandas already provides for reading CSV files? – Karl Knechtel Mar 21 '23 at 12:45
  • An observation: a proper CSV file cannot have a `\n` inside a value, unless it's within quotes. Especially if your CSV file has quotes, I would suggest to use an already existing library (e.g. https://docs.python.org/3/library/csv.html) – matteo_c Mar 21 '23 at 12:46

1 Answers1

1

You should be using the standard library's csv module to read the files and as recommended by the docs, you should open CSV files with newline=''. E.g.,

with open('file.csv', newline='') as csvfile:
    ...

From the footnote on the linked page:

If newline='' is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use \r\n linendings on write an extra \r will be added. It should always be safe to specify newline='', since the csv module does its own (universal) newline handling.

Benji York
  • 2,044
  • 16
  • 20
  • I don't think this is related to the question but I'm admittedly not entirely sure since the question is vague. – Konrad Rudolph Mar 21 '23 at 12:45
  • @KonradRudolph, I added a quote of the footnote from the Python docs that explains how my suggestion is related. – Benji York Mar 21 '23 at 12:46
  • I know what `newline` does, I just don't think it's relevant here. The text you've quoted does not explain the relevance. – Konrad Rudolph Mar 21 '23 at 12:47
  • Ah! I see the confusion. I was assuming this person was using the `csv` module, but they were apparently not. If they were, my recommendation would have been salient. – Benji York Mar 21 '23 at 12:48