Need To Remove First And Last \n From DataFrame Text

Question

Let's say I have the following data frame:

"\n mark has \n no name",
"\n john walks his \n dog",
"mary is fun \n",
"tim is \n old"

data = [
    "\n mark has \n no name",
    "\n john walks his \n dog",
    "mary is fun \n",
    "tim is \n old"
]

df = pd.DataFrame(data, columns=['Sentences'])

How can I write a function, ideally a lambda, as I have not much practice with it, that will replace the first \n and last \n only in each of the above, so the output is:

"mark has \n no name",
"john walks his \n dog",
"mary is fun",
"tim is \n old"

Ideally, I would like the output to be a separate column on the data frame, as opposed to replacing what is there.

I have seen formulas that deal with a global replacement, but I need something a bit more specific.

You can `strip()` the elements in `data` prior to adding them to the data frame. Something like this `df = pd.DataFrame([_.strip() for _ in data], columns=['Sentences'])`. — accdias, Nov 03 '22 at 18:23
Does this answer your question? [Pythonic/efficient way to strip whitespace from every Pandas Data frame cell that has a stringlike object in it](https://stackoverflow.com/questions/33788913/pythonic-efficient-way-to-strip-whitespace-from-every-pandas-data-frame-cell-tha) Specifically [this answer](https://stackoverflow.com/a/33789292/843953) — Pranav Hosangadi, Nov 05 '22 at 15:07

pwoolvett · Answer 1 · 2022-11-03T18:26:15.087

-1

df['fixed'] = df.Sentences.str.strip("\n")

or, in case you want more granular control over what to strip on the left / right:

df['fixed'] = df.Sentences.str.lstrip("\n").str.rstrip("\n")

pandas has Vectorized string functions for Series, so you can do colum.str, then apply common string methods such as lstrip and rstrip.

edited Nov 03 '22 at 18:26

answered Nov 03 '22 at 18:12

pwoolvett

524
4
13

1

Why `lstrip` and `rstrip` and not just `strip`? – Amos Baker Nov 03 '22 at 18:18
It gives you the flexibility to strip different charsets on either side – pwoolvett Nov 03 '22 at 18:23
1

For the question asked by the OP, they just add unnecessary complexity. – accdias Nov 03 '22 at 18:25

score -2 · Accepted Answer · answered Nov 03 '22 at 17:55

It's not a lambda, but the below code seems to fit the bill. The basic approach is to iterate through the dataframe, row by row, and test each string for whether it starts or ends with \n, then use string indices to return a string that doesn't have that character.

import pandas as pd

data = ["\n mark has \n no name","\n john walks his \n dog","mary is fun \n","tim is \n old"]

df = pd.DataFrame(data, columns=['Sentences'])


def replace_newline(dataframe_row, col_name, input_string):
    test_string = dataframe_row[col_name]

    if test_string.startswith(input_string):
        return_string = test_string[len(input_string):]
        return return_string

    elif test_string.endswith(input_string):
        return_string = test_string[:-(len(input_string))]
        return return_string

    else:
        return test_string

df['Sentences_edited'] = df.apply(replace_newline, col_name='Sentences', axis=1)
print(df)

Don't try to reinvent the wheel. `str.strip()` does exactly what the OP is looking for. — accdias, Nov 03 '22 at 18:19

Need To Remove First And Last \n From DataFrame Text

2 Answers2