3

I want to shorten and clean up a CSV file to use it in ElasticSearch. but there are line breaks in some Dataframes (cells) and it is not possible to parse the CSV to ElasticSearch. I now shortend the CSV with pandas and tried to remove the newline but it is not working.

Code is the following:

import pandas as pd

f=pd.read_csv("test.csv")

keep_col = ["Plugin ID","CVE","CVSS","Risk","Host","Protocol","Port","Name","Synopsis","Description","Solution",]

new_f = f[keep_col].replace('\\n',' ', regex=True)
new_f.to_csv("newFile.csv", index=False)

the shortage is working, but i have newlines in Description, Synopsis and Solutions. Any idea how to solve it with Python / Pandas? The CSV has about 100k entries so the linebreak removal has to be done in every entry.

Underoos
  • 4,708
  • 8
  • 42
  • 85
Marvin Kallohn
  • 943
  • 3
  • 9
  • 12

2 Answers2

5

From what I've learnt, the third parameter for the .replace() parameter takes the count of the number of times you want to replace the old substring with the new substring, so instead just remove the third parameter since you don't know the number of times the new line exists.

new_f = f[keep_col].replace('\\n',' ')

This should help

DevTotti
  • 86
  • 6
0

In case, using pandas data-frame is not compulsory , you can do it in following way using simple python:

with open('test.csv', 'r') as txtReader:
    with open('new_test.csv', 'w') as txtWriter:
        for line in txtReader.readlines():
            line = line.replace('\\n', '')
            txtWriter.write(line)
K.Afroz
  • 21
  • 4