0

I have a csv-file (containing +1000 lines and \t is used as delimiter) that I want to load into Python as a list. Here are the first few lines of the file:

"col1"  "col2"  "col3"  "col4"  "col5"  "col6"
1   "01-01-2017 00:00:00"   "02-02-2017 00:00:00"   "str1"  "str3"  "str4 åå here comes a few newline characters







"
2   "01-01-2017 00:00:00"   "02-02-2017 00:00:00"   "str2"  "str3"  "str5 åasg here comes more newlines

"

As you can see, the strings tend to contain many newline-characters. Is there a way to strip the strings for all newline characters and then make a list containing all rows?


My attempt: Based on this thread here is my attempt:

import csv
with open('test.dat') as csvDataFile:
    csvReader = csv.reader(csvDataFile, delimiter="\t")
    for i in csvReader:
        print(list(map(str.strip,i)))

However, this doesn't strip anything.

N08
  • 1,265
  • 13
  • 23
  • what have you tried? Post your not-working code first, don't wait for people to do the job for you. – hzitoun Jan 29 '18 at 13:31
  • 1
    Use pandas to load the csv into a dataframe df. And then use df.apply and a proper lambda function to process the strings in the cells. – Jonathan Scholbach Jan 29 '18 at 13:35

2 Answers2

0

Sample snippet to remove newline("\n") from a list

a = ['\n', "a", "b", "c", "\n"]
def remNL(l):
    return [i for i in l if i != "\n"]    

print filter(remNL, a)

In your case

print(filter(remNL,i))
Rakesh
  • 81,458
  • 17
  • 76
  • 113
  • `print(filter(remNL,i))` gives me the same as `print(i)`, the newline characters are not removed – N08 Jan 29 '18 at 13:43
0

You could use a regular expression to find all of the repeated \n characters and then remove them from the input text.

import re  # The module for regular expressions

input = """ The text from the csv file """

# Find all the repeated \n chars in input and replace them with ""
# Take the first element as the function returns a tuple with the 
# new string and the number of subs made
stripedInput = re.subn(r"\n{2,}", "", input)[0]

We now have the csv file text without any duplicate \n characters. The rows can then be obtained by

rows = stripedInput.split("\n")

If you then want to split into columns can then do

for i in range(len(rows)):
  rows[i] = rows[i].split("\t") 
AKnightLaw
  • 53
  • 6