How to remove newline in pandas dataframe columns?

Question

I want to shorten and clean up a CSV file to use it in ElasticSearch. but there are line breaks in some Dataframes (cells) and it is not possible to parse the CSV to ElasticSearch. I now shortend the CSV with pandas and tried to remove the newline but it is not working.

Code is the following:

import pandas as pd

f=pd.read_csv("test.csv")

keep_col = ["Plugin ID","CVE","CVSS","Risk","Host","Protocol","Port","Name","Synopsis","Description","Solution",]

new_f = f[keep_col].replace('\\n',' ', regex=True)
new_f.to_csv("newFile.csv", index=False)

the shortage is working, but i have newlines in Description, Synopsis and Solutions. Any idea how to solve it with Python / Pandas? The CSV has about 100k entries so the linebreak removal has to be done in every entry.

Possible duplicate of [removing newlines from messy strings in pandas dataframe cells?](https://stackoverflow.com/questions/44227748/removing-newlines-from-messy-strings-in-pandas-dataframe-cells) — naivepredictor, Apr 04 '19 at 07:55
i tried, but what exactly is "df"? it says... variable not defined. — Marvin Kallohn, Apr 04 '19 at 07:56
Hi @MarvinKallohn, in your case the `df` would be `f`. also you can check better with pandas apply function, where you can run the function on data frame column to remove the newline. — Chitrank Dixit, Apr 04 '19 at 08:00
yes, i thought the same and tried it in my code, as you can see it above but it is still not working — Marvin Kallohn, Apr 04 '19 at 08:05

score 5 · Accepted Answer · answered Apr 04 '19 at 09:08

From what I've learnt, the third parameter for the .replace() parameter takes the count of the number of times you want to replace the old substring with the new substring, so instead just remove the third parameter since you don't know the number of times the new line exists.

new_f = f[keep_col].replace('\\n',' ')

This should help

score 0 · Answer 2 · answered Apr 04 '19 at 10:13

In case, using pandas data-frame is not compulsory , you can do it in following way using simple python:

with open('test.csv', 'r') as txtReader:
    with open('new_test.csv', 'w') as txtWriter:
        for line in txtReader.readlines():
            line = line.replace('\\n', '')
            txtWriter.write(line)

How to remove newline in pandas dataframe columns?

2 Answers2

Linked