Pandas: How to return rows where a column has a line breaks/new line ( \n ) in its cell?

Question

I am trying to return rows if a column contains a line break and specific word following it. So '\nWord'.

Here is a minimal example

testdf = pd.DataFrame([['test1', ' generates the final summary. \nRESULTS We evaluate the performance of ', ], ['test2', 'the cat and bat \n\n\nRESULTS\n teamed up to find some food'], ['test2' , 'anthropology with RESULTS pharmacology and biology']])
testdf.columns = ['A', 'B']
testdf.head()

>   A   B
>0  test1   generates the final summary. \nRESULTS We evaluate the performance of
>1  test2   the cat and bat \n\n\nRESULTS\n teamed up to find some food
>2  test2   anthropology with RESULTS pharmacology and biology

listStrings = { '\nRESULTS\n'}
testdf.loc[testdf.B.apply(lambda x: len(listStrings.intersection(x.split())) >= 1)]

This returns nothing.

The result I am trying to produce is return the first two rows since they contain '\nRESULTS' , but NOT the last row since it doesn't have a '\nRESULTS'

So

>   A   B
>0  test1   generates the final summary. \nRESULTS We evaluate the performance of
>1  test2   the cat and bat \n\n\nRESULTS\n teamed up to find some food

score 1 · Answer 1 · answered Jun 17 '19 at 02:34

1

Usually we using str.contains with regex=False

testdf[testdf.B.str.contains('\n',regex=False)]

answered Jun 17 '19 at 02:34

BENY

317,841
20
164
234

score 1 · Accepted Answer · edited Jun 17 '19 at 02:37

1

Can you try below:

import re
df1 = testdf[testdf['B'].str.contains('\nRESULTS', flags = re.IGNORECASE)]
df1
#output
A   B
0   test1   generates the final summary. \nRESULTS We eva...
1   test2   the cat and bat \n\n\nRESULTS\n teamed up to f...

edited Jun 17 '19 at 02:37

U13-Forward

69,221
14
89
114

answered Jun 17 '19 at 02:35

vrana95

511
2
10

Luis l · Answer 3 · 2023-01-17T12:58:01.937

Sometimes if they are very confusing text with a lot \t|\n|\r, it is not able to find them, I offer you a regular expression that collects all the cases

Example: this code will take all the columns WHERE \t|\n|\r appear

df_r = df_r[df_r["Name"].astype(str).str.contains(r"\\t|\\n|\\r", "\t|\n|\r",regex=True)]

the answer has been inspired by: removing newlines from messy strings in pandas dataframe cells?

score 0 · Answer 4 · answered Jun 17 '19 at 02:36

WeNYoBen's solution is better, but one with iloc and np.where would be:

>>> testdf.iloc[np.where(testdf['B'].str.contains('\n', regex=False))]
       A                                                  B
0  test1   generates the final summary. \nRESULTS We eva...
1  test2  the cat and bat \n\n\nRESULTS\n teamed up to f...
>>>

Pandas: How to return rows where a column has a line breaks/new line ( \n ) in its cell?

4 Answers4

Linked