Load csv in Python and strip newline-characters

Question

I have a csv-file (containing +1000 lines and \t is used as delimiter) that I want to load into Python as a list. Here are the first few lines of the file:

"col1"  "col2"  "col3"  "col4"  "col5"  "col6"
1   "01-01-2017 00:00:00"   "02-02-2017 00:00:00"   "str1"  "str3"  "str4 åå here comes a few newline characters







"
2   "01-01-2017 00:00:00"   "02-02-2017 00:00:00"   "str2"  "str3"  "str5 åasg here comes more newlines

"

As you can see, the strings tend to contain many newline-characters. Is there a way to strip the strings for all newline characters and then make a list containing all rows?

My attempt: Based on this thread here is my attempt:

import csv
with open('test.dat') as csvDataFile:
    csvReader = csv.reader(csvDataFile, delimiter="\t")
    for i in csvReader:
        print(list(map(str.strip,i)))

However, this doesn't strip anything.

what have you tried? Post your not-working code first, don't wait for people to do the job for you. — hzitoun, Jan 29 '18 at 13:31
Use pandas to load the csv into a dataframe df. And then use df.apply and a proper lambda function to process the strings in the cells. — Jonathan Scholbach, Jan 29 '18 at 13:35

score 0 · Answer 1 · answered Jan 29 '18 at 13:37

0

Sample snippet to remove newline("\n") from a list

a = ['\n', "a", "b", "c", "\n"]
def remNL(l):
    return [i for i in l if i != "\n"]    

print filter(remNL, a)

In your case

print(filter(remNL,i))

answered Jan 29 '18 at 13:37

Rakesh

81,458
17
76
113

`print(filter(remNL,i))` gives me the same as `print(i)`, the newline characters are not removed – N08 Jan 29 '18 at 13:43

AKnightLaw · Answer 2 · 2018-01-30T00:12:38.687

You could use a regular expression to find all of the repeated \n characters and then remove them from the input text.

import re  # The module for regular expressions

input = """ The text from the csv file """

# Find all the repeated \n chars in input and replace them with ""
# Take the first element as the function returns a tuple with the 
# new string and the number of subs made
stripedInput = re.subn(r"\n{2,}", "", input)[0]

We now have the csv file text without any duplicate \n characters. The rows can then be obtained by

rows = stripedInput.split("\n")

If you then want to split into columns can then do

for i in range(len(rows)):
  rows[i] = rows[i].split("\t")

Load csv in Python and strip newline-characters

2 Answers2