174

Basically, I'm asking the user to input a string of text into the console, but the string is very long and includes many line breaks. How would I take the user's string and delete all line breaks to make it a single line of text. My method for acquiring the string is very simple.

string = raw_input("Please enter string: ")

Is there a different way I should be grabbing the string from the user? I'm running Python 2.7.4 on a Mac.

P.S. Clearly I'm a noob, so even if a solution isn't the most efficient, the one that uses the most simple syntax would be appreciated.

Ian Zane
  • 2,209
  • 5
  • 23
  • 21
  • http://stackoverflow.com/questions/1185524/how-to-trim-whitespace-including-tabs –  May 15 '13 at 13:27
  • 5
    @NicYoung, that is similar but different. `strip` removes whitespace at the start and end of a string, not *inside* the string... – Daren Thomas May 15 '13 at 13:33

11 Answers11

261

How do you enter line breaks with raw_input? But, once you have a string with some characters in it you want to get rid of, just replace them.

>>> mystr = raw_input('please enter string: ')
please enter string: hello world, how do i enter line breaks?
>>> # pressing enter didn't work...
...
>>> mystr
'hello world, how do i enter line breaks?'
>>> mystr.replace(' ', '')
'helloworld,howdoienterlinebreaks?'
>>>

In the example above, I replaced all spaces. The string '\n' represents newlines. And \r represents carriage returns (if you're on windows, you might be getting these and a second replace will handle them for you!).

basically:

# you probably want to use a space ' ' to replace `\n`
mystring = mystring.replace('\n', ' ').replace('\r', '')

Note also, that it is a bad idea to call your variable string, as this shadows the module string. Another name I'd avoid but would love to use sometimes: file. For the same reason.

Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
Daren Thomas
  • 67,947
  • 40
  • 154
  • 200
66

You can try using string replace:

string = string.replace('\r', '').replace('\n', '')
Konstantin Dinev
  • 34,219
  • 14
  • 75
  • 100
41

You can split the string with no separator arg, which will treat consecutive whitespace as a single separator (including newlines and tabs). Then join using a space:

In : " ".join("\n\nsome    text \r\n with multiple whitespace".split())
Out: 'some text with multiple whitespace'

https://docs.python.org/2/library/stdtypes.html#str.split

Sean
  • 15,561
  • 4
  • 37
  • 37
38

The canonic answer, in Python, would be :

s = ''.join(s.splitlines())

It splits the string into lines (letting Python doing it according to its own best practices). Then you merge it. Two possibilities here:

  • replace the newline by a whitespace (' '.join())
  • or without a whitespace (''.join())
fralau
  • 3,279
  • 3
  • 28
  • 41
15

updated based on Xbello comment:

string = my_string.rstrip('\r\n')

read more here

tokhi
  • 21,044
  • 23
  • 95
  • 105
9

Another option is regex:

>>> import re
>>> re.sub("\n|\r", "", "Foo\n\rbar\n\rbaz\n\r")
'Foobarbaz'
Neil
  • 8,925
  • 10
  • 44
  • 49
  • 1
    more info on how to match consecutive linebreaks would be nice `r'[\n\r]+'` or even `r'\s+'` to replace any whitespace with a single space. – Risadinha Feb 04 '19 at 14:45
7

If anybody decides to use replace, you should try r'\n' instead '\n'

mystring = mystring.replace(r'\n', ' ').replace(r'\r', '')
Anar Salimkhanov
  • 729
  • 10
  • 12
  • 1
    Why? I vaguely remember why this is a good idea, but we need to document it. – Martin Burch Jul 03 '20 at 21:37
  • 1
    In my case, I needed to do this: 1. Get HTML code from DB 2. Get needed text from HTML 3. Remove all newline from text 4. Insert edited text to a spreadsheet document And it didn't work properly, unless I used `r` ( "raw string literal"). Unfortunately, I have no idea why ) – Anar Salimkhanov Jul 04 '20 at 22:32
  • 3
    **NOTE** that `r'\r'` will match the literal "backslash r" -- not the "Carriage Return" character. Use either according to your input data. – DerMike Sep 30 '20 at 16:27
  • 2
    this one worked. None of the others. Thnx! – Allohvk Jun 23 '22 at 14:15
3

A method taking into consideration

  • additional white characters at the beginning/end of string
  • additional white characters at the beginning/end of every line
  • various end-line characters

it takes such a multi-line string which may be messy e.g.

test_str = '\nhej ho \n aaa\r\n   a\n '

and produces nice one-line string

>>> ' '.join([line.strip() for line in test_str.strip().splitlines()])
'hej ho aaa a'

UPDATE: To fix multiple new-line character producing redundant spaces:

' '.join([line.strip() for line in test_str.strip().splitlines() if line.strip()])

This works for the following too test_str = '\nhej ho \n aaa\r\n\n\n\n\n a\n '

  • This doesn't handle the case of contiguous line feeds in the middle of the string. Two line feeds result in two contiguous blanks in the output. Try "test_str = '\nhej ho \n aaa\r\n\n a\n '" – Mike Gleen Jul 11 '17 at 12:08
1

The problem with rstrip() is that it does not work in all cases (as I myself have seen few). Instead you can use

text = text.replace("\n"," ")

This will remove all new line '\n' with a space.

wovano
  • 4,543
  • 5
  • 22
  • 49
1

Regular expressions is the fastest way to do this

s='''some kind   of
string with a bunch\r of

  
 extra spaces in   it'''

re.sub(r'\s(?=\s)','',re.sub(r'\s',' ',s))

result:

'some kind of string with a bunch of extra spaces in it'
Quin
  • 87
  • 10
0

You really don't need to remove ALL the signs: lf cr crlf.

# Pythonic:
r'\n', r'\r', r'\r\n' 

Some texts must have breaks, but you probably need to join broken lines to keep particular sentences together.

Therefore it is natural that line breaking happens after priod, semicolon, colon, but not after comma.

My code considers above conditions. Works well with texts copied from pdfs. Enjoy!:

def unbreak_pdf_text(raw_text):
    """ the newline careful sign removal tool

    Args:
        raw_text (str): string containing unwanted newline signs: \\n or \\r or \\r\\n
        e.g. imported from OCR or copied from a pdf document.

    Returns:
        _type_: _description_
    """
    pat = re.compile((r"[, \w]\n|[, \w]\r|[, \w]\r\n"))
    breaks = re.finditer(pat, raw_text)

    processed_text = raw_text
    raw_text = None

    for i in breaks:
        processed_text = processed_text.replace(i.group(), i.group()[0]+" ")

    return processed_text
pythamator
  • 11
  • 1