-3

I have an output coming out of a function. The output string has escape sequence (mainly \n).

I need to format this string so that escape sequence are implemented before I pass the formatted string to another function. The second function needs it in formatted way for it to do a text search with similar text.

Here's what I mean:

text = r'Your team should include the following \nboard-certified experts:\n \nA pulmonologist is a doctor who's an \nexpert of lung diseases.\n'

formatted_text = """Your team should include the following 
board-certified experts:
 
A pulmonologist is a doctor who's an 
expert of lung diseases."""

So as u see, if I do print(text), i will get the second string as print will implement the escape sequences. But I don't want to print. I want to format as second and store it another variable.

EDIT:

Pls run it in a notebook cell and DO NOT print(formatted_text). Run it just as a variable. If it doesn't remove escape sequence, it's not what I want.

enter image description here

Edit 2:

What I am looking for:

enter image description here

Baktaawar
  • 7,086
  • 24
  • 81
  • 149
  • What kind of function returns literal escape sequences? If it's returning JSON, use `json.loads()`. – Barmar Jun 20 '23 at 01:34
  • it's basically a Document variable that has the above string as part of its dictionary. – Baktaawar Jun 20 '23 at 01:39
  • I added already in question above – Baktaawar Jun 20 '23 at 01:43
  • Your `search_text` doesn't have literal `\n` in it. You need to use a raw string or write `\\n` to get that. – Barmar Jun 20 '23 at 01:45
  • All I am saying is, I have a text string which looks like search_text. I don't want to have \n mentioned there but I want a formatted string variable which has the \n implemented as new line. So anything after \n in search_text is in a new line in that formatted variable – Baktaawar Jun 20 '23 at 01:46
  • doesn't work. Pls run it in notebook and u will see – Baktaawar Jun 20 '23 at 01:47
  • Is `\n \n` intentional (with space in between), or should it be `\n\n`? – qrsngky Jun 20 '23 at 02:07
  • Maybe it's an [XY](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) problem. Why don't you just try to modify your "search function" to make it work with a normal Python string implement – Hoang Minh Quang FX15045 Jun 20 '23 at 02:08
  • problem is I get an output in search_text format. I am trying to compare or do string match that output with a text which is in formatted_string format. if I just use it as it is, it won't compare because my search_text has \n and other escape sequences. While the text I am comparing it to has those in new lines and other things. I can not modify the output of both the search_text and the string I am comparing it to, without I run some logic on one of them to make them comparable. The output I get from search_text is not coming from my fnx but a package which throws that output – Baktaawar Jun 20 '23 at 02:15
  • Also, please provide the code **as text not as screenshots** – juanpa.arrivillaga Jun 20 '23 at 02:18
  • Just use .replace(r"\n","\n") like [this](https://ideone.com/TVRph3) – Hoang Minh Quang FX15045 Jun 20 '23 at 04:34

2 Answers2

2

Presuming you are wishing to keep these escape characters function (tab, newline, etc.) but remove the actual text for it, use codecs.

import codecs

text = r"Your team should include the following \nboard-certified experts:\n \nA pulmonologist is a doctor who's an \nexpert of lung diseases.\n"

print(text)

formatted_text = codecs.decode(text, 'unicode_escape')

print(formatted_text)

Returns:

Your team should include the following
board-certified experts:

A pulmonologist is a doctor who's an
expert of lung diseases.

See this post with a similar question

EDIT:

enter image description here

EDIT 2: Note there is a newline character at the end of the raw string, so either remove the newline character at the end of a string OR your desired string will be slightly different.

import codecs

text = r"Your team should include the following \nboard-certified experts:\n \nA pulmonologist is a doctor who's an \nexpert of lung diseases.\n" # <- because of newline here in original text

desired_text = """Your team should include the following 
board-certified experts:
 
A pulmonologist is a doctor who's an 
expert of lung diseases.
"""    # <- additional newline here

formatted_text = codecs.decode(text, 'unicode_escape')

if desired_text == formatted_text:
    print("yep, the same")
else:
    print("nope")

ran this with the additional newline and it works

if you need to remove any additional newlines/escape characters at the end, I would simply add this after you get the raw text and before you decode the text:

# list of escape characters you want to exclude at the end
avoided_trailing_characters = ['\\s', '\\n', '\\t']

for trailing_character in avoided_trailing_characters:
    text = text.strip(trailing_character)

Hope this helps

futium
  • 90
  • 1
  • 9
  • codecs doesn't work. Again pls DON'T PRINT (formatted_text) – Baktaawar Jun 20 '23 at 01:49
  • pls do not close this question if ur answer isn't correct. – Baktaawar Jun 20 '23 at 01:50
  • Why doesn't codecs work for you? – futium Jun 20 '23 at 01:50
  • pls check my screenshot in ur answer as EDIT – Baktaawar Jun 20 '23 at 01:53
  • @Baktaawar I may be daft but I'm not seeing it. Is it because of the quotes around the string when you call formatted_text? That's because the print excludes the quotes and when you call formatted_text it includes the quotes as a part of the datatype. But if that's the case, that won't have an impact on your search function. Both are strings – futium Jun 20 '23 at 01:57
  • check the edit with what I am looking for. Hope that helps. if u check formatted_text in first edit, urs has \n as part of it. I don't want \n as part of string but as implemented – Baktaawar Jun 20 '23 at 02:00
  • 1
    @Baktaawar okay, i see what you're saying. i got it to work. it was a matter of in the raw text, there was a newline character at the end, which would be different than your desired text. Do you want to ignore escape characters after the last non-escape character? check the edit – futium Jun 20 '23 at 02:06
0

You can process your text as follows:

text = "Your team should include the following \\nboard-certified experts:\\n \\nA pulmonologist is a doctor who's an \\nexpert of lung diseases.\\n"


def process(text):
    return text.replace("\\n", "\n")


formatted_text = process(text)
print(formatted_text)
Ibrahim Berber
  • 842
  • 2
  • 16
  • `\n` is not the only possible escape sequence. – Barmar Jun 20 '23 at 01:29
  • They said "mainly", not only. – Barmar Jun 20 '23 at 01:31
  • this doesn't work. I have tried it. And yes \n might not be the only escape sequence. Don't use print as print will automatically implement escape sequence so what u see is not what the variable has . U can check it if u run it in a cell of notebook and just do formatted_text and run it – Baktaawar Jun 20 '23 at 01:32
  • 1
    @Baktaawar Stop saying that. `print()` doesn't know anything about escape sequences. Try `print(r'\n')` and you'll see. – Barmar Jun 20 '23 at 01:42
  • @Baktaawar This does work for `\n`: https://ideone.com/FDyHBk – Barmar Jun 20 '23 at 01:47
  • sir u r printing it – Baktaawar Jun 20 '23 at 01:48
  • here's a simple request and exercise for all who keep saying it works by printing the variable. Just use ur formatted_text variable u get after running ur logic and compare it with the formatted_variable I have mentioned in my above question. If it doesn't match in simple string match, then it isn't the same. – Baktaawar Jun 20 '23 at 01:55