Remove all line breaks from a long string of text

Question

Basically, I'm asking the user to input a string of text into the console, but the string is very long and includes many line breaks. How would I take the user's string and delete all line breaks to make it a single line of text. My method for acquiring the string is very simple.

string = raw_input("Please enter string: ")

Is there a different way I should be grabbing the string from the user? I'm running Python 2.7.4 on a Mac.

P.S. Clearly I'm a noob, so even if a solution isn't the most efficient, the one that uses the most simple syntax would be appreciated.

http://stackoverflow.com/questions/1185524/how-to-trim-whitespace-including-tabs — , May 15 '13 at 13:27
@NicYoung, that is similar but different. `strip` removes whitespace at the start and end of a string, not *inside* the string... — Daren Thomas, May 15 '13 at 13:33

score 261 · Accepted Answer · edited Mar 11 '17 at 15:07

How do you enter line breaks with raw_input? But, once you have a string with some characters in it you want to get rid of, just replace them.

>>> mystr = raw_input('please enter string: ')
please enter string: hello world, how do i enter line breaks?
>>> # pressing enter didn't work...
...
>>> mystr
'hello world, how do i enter line breaks?'
>>> mystr.replace(' ', '')
'helloworld,howdoienterlinebreaks?'
>>>

In the example above, I replaced all spaces. The string '\n' represents newlines. And \r represents carriage returns (if you're on windows, you might be getting these and a second replace will handle them for you!).

basically:

# you probably want to use a space ' ' to replace `\n`
mystring = mystring.replace('\n', ' ').replace('\r', '')

Note also, that it is a bad idea to call your variable string, as this shadows the module string. Another name I'd avoid but would love to use sometimes: file. For the same reason.

@QuestMonger's approach to replace both at once makes more sense to me. Why do two separate replaces: `mystring.replace('\n', ' ').replace('\r', '')`? Thanks — information_interchange, Jan 21 '19 at 17:27
@information_interchange This approach works on Linux files that have `\n` but not `\r\n`. — Noumenon, Mar 03 '19 at 23:26

Konstantin Dinev · Answer 2 · 2017-02-13T09:31:16.303

66

You can try using string replace:

string = string.replace('\r', '').replace('\n', '')

edited Feb 13 '17 at 09:31

answered May 15 '13 at 13:28

Konstantin Dinev

34,219
14
75
100

score 41 · Answer 3 · answered May 03 '16 at 10:24

You can split the string with no separator arg, which will treat consecutive whitespace as a single separator (including newlines and tabs). Then join using a space:

In : " ".join("\n\nsome    text \r\n with multiple whitespace".split())
Out: 'some text with multiple whitespace'

https://docs.python.org/2/library/stdtypes.html#str.split

score 38 · Answer 4 · answered Dec 18 '20 at 10:47

38

The canonic answer, in Python, would be :

s = ''.join(s.splitlines())

It splits the string into lines (letting Python doing it according to its own best practices). Then you merge it. Two possibilities here:

replace the newline by a whitespace (' '.join())
or without a whitespace (''.join())

answered Dec 18 '20 at 10:47

fralau

3,279
3
28
41

1

It should be the accepted answer. It is also OS independent – Emer Jun 15 '21 at 10:52
@Emer However it does remove \t at the same time which does not have to be the aim. – RunTheGauntlet Apr 04 '22 at 14:18

tokhi · Answer 5 · 2014-12-03T13:18:16.510

15

updated based on Xbello comment:

string = my_string.rstrip('\r\n')

read more here

edited Dec 03 '14 at 13:18

answered Sep 24 '14 at 09:43

tokhi

21,044
23
95
105

score 9 · Answer 6 · answered May 31 '18 at 11:36

9

Another option is regex:

>>> import re
>>> re.sub("\n|\r", "", "Foo\n\rbar\n\rbaz\n\r")
'Foobarbaz'

answered May 31 '18 at 11:36

Neil

8,925
10
44
49

1

more info on how to match consecutive linebreaks would be nice `r'[\n\r]+'` or even `r'\s+'` to replace any whitespace with a single space. – Risadinha Feb 04 '19 at 14:45

score 7 · Answer 7 · answered Jun 15 '20 at 18:04

7

If anybody decides to use replace, you should try r'\n' instead '\n'

mystring = mystring.replace(r'\n', ' ').replace(r'\r', '')

answered Jun 15 '20 at 18:04

Anar Salimkhanov

729
10
12

1

Why? I vaguely remember why this is a good idea, but we need to document it. – Martin Burch Jul 03 '20 at 21:37
1

In my case, I needed to do this: 1. Get HTML code from DB 2. Get needed text from HTML 3. Remove all newline from text 4. Insert edited text to a spreadsheet document And it didn't work properly, unless I used `r` ( "raw string literal"). Unfortunately, I have no idea why ) – Anar Salimkhanov Jul 04 '20 at 22:32
3

**NOTE** that `r'\r'` will match the literal "backslash r" -- not the "Carriage Return" character. Use either according to your input data. – DerMike Sep 30 '20 at 16:27
2

this one worked. None of the others. Thnx! – Allohvk Jun 23 '22 at 14:15

Kamil Neczaj · Answer 8 · 2019-02-03T15:22:46.310

3

A method taking into consideration

additional white characters at the beginning/end of string
additional white characters at the beginning/end of every line
various end-line characters

it takes such a multi-line string which may be messy e.g.

test_str = '\nhej ho \n aaa\r\n   a\n '

and produces nice one-line string

>>> ' '.join([line.strip() for line in test_str.strip().splitlines()])
'hej ho aaa a'

UPDATE: To fix multiple new-line character producing redundant spaces:

' '.join([line.strip() for line in test_str.strip().splitlines() if line.strip()])

This works for the following too test_str = '\nhej ho \n aaa\r\n\n\n\n\n a\n '

edited Feb 03 '19 at 15:22

answered Mar 01 '17 at 10:42

Kamil Neczaj

31
2

This doesn't handle the case of contiguous line feeds in the middle of the string. Two line feeds result in two contiguous blanks in the output. Try "test_str = '\nhej ho \n aaa\r\n\n a\n '" – Mike Gleen Jul 11 '17 at 12:08

score 1 · Answer 9 · edited Sep 10 '22 at 08:04

1

The problem with rstrip() is that it does not work in all cases (as I myself have seen few). Instead you can use

text = text.replace("\n"," ")

This will remove all new line '\n' with a space.

edited Sep 10 '22 at 08:04

wovano

4,543
5
22
49

answered May 07 '19 at 08:52

Ankit Dwivedi

27
3

score 1 · Answer 10 · answered Mar 18 '22 at 23:53

1

Regular expressions is the fastest way to do this

s='''some kind   of
string with a bunch\r of

  
 extra spaces in   it'''

re.sub(r'\s(?=\s)','',re.sub(r'\s',' ',s))

result:

'some kind of string with a bunch of extra spaces in it'

answered Mar 18 '22 at 23:53

Quin

87
10

score 0 · Answer 11 · answered Dec 08 '22 at 12:40

You really don't need to remove ALL the signs: lf cr crlf.

# Pythonic:
r'\n', r'\r', r'\r\n'

Some texts must have breaks, but you probably need to join broken lines to keep particular sentences together.

Therefore it is natural that line breaking happens after priod, semicolon, colon, but not after comma.

My code considers above conditions. Works well with texts copied from pdfs. Enjoy!:

def unbreak_pdf_text(raw_text):
    """ the newline careful sign removal tool

    Args:
        raw_text (str): string containing unwanted newline signs: \\n or \\r or \\r\\n
        e.g. imported from OCR or copied from a pdf document.

    Returns:
        _type_: _description_
    """
    pat = re.compile((r"[, \w]\n|[, \w]\r|[, \w]\r\n"))
    breaks = re.finditer(pat, raw_text)

    processed_text = raw_text
    raw_text = None

    for i in breaks:
        processed_text = processed_text.replace(i.group(), i.group()[0]+" ")

    return processed_text

Remove all line breaks from a long string of text

11 Answers11

Linked

Related