-1

Let's say I have the following data frame:

"\n mark has \n no name",
"\n john walks his \n dog",
"mary is fun \n",
"tim is \n old"
data = [
    "\n mark has \n no name",
    "\n john walks his \n dog",
    "mary is fun \n",
    "tim is \n old"
]

df = pd.DataFrame(data, columns=['Sentences'])

How can I write a function, ideally a lambda, as I have not much practice with it, that will replace the first \n and last \n only in each of the above, so the output is:

"mark has \n no name",
"john walks his \n dog",
"mary is fun",
"tim is \n old"

Ideally, I would like the output to be a separate column on the data frame, as opposed to replacing what is there.

I have seen formulas that deal with a global replacement, but I need something a bit more specific.

accdias
  • 5,160
  • 3
  • 19
  • 31
Mark
  • 1
  • 1
  • You can `strip()` the elements in `data` prior to adding them to the data frame. Something like this `df = pd.DataFrame([_.strip() for _ in data], columns=['Sentences'])`. – accdias Nov 03 '22 at 18:23
  • Does this answer your question? [Pythonic/efficient way to strip whitespace from every Pandas Data frame cell that has a stringlike object in it](https://stackoverflow.com/questions/33788913/pythonic-efficient-way-to-strip-whitespace-from-every-pandas-data-frame-cell-tha) Specifically [this answer](https://stackoverflow.com/a/33789292/843953) – Pranav Hosangadi Nov 05 '22 at 15:07

2 Answers2

-1
df['fixed'] = df.Sentences.str.strip("\n")

or, in case you want more granular control over what to strip on the left / right:

df['fixed'] = df.Sentences.str.lstrip("\n").str.rstrip("\n")

pandas has Vectorized string functions for Series, so you can do colum.str, then apply common string methods such as lstrip and rstrip.

pwoolvett
  • 524
  • 4
  • 13
-2

It's not a lambda, but the below code seems to fit the bill. The basic approach is to iterate through the dataframe, row by row, and test each string for whether it starts or ends with \n, then use string indices to return a string that doesn't have that character.

import pandas as pd

data = ["\n mark has \n no name","\n john walks his \n dog","mary is fun \n","tim is \n old"]

df = pd.DataFrame(data, columns=['Sentences'])


def replace_newline(dataframe_row, col_name, input_string):
    test_string = dataframe_row[col_name]

    if test_string.startswith(input_string):
        return_string = test_string[len(input_string):]
        return return_string

    elif test_string.endswith(input_string):
        return_string = test_string[:-(len(input_string))]
        return return_string

    else:
        return test_string

df['Sentences_edited'] = df.apply(replace_newline, col_name='Sentences', axis=1)
print(df)
Otto
  • 22
  • 4