2

Need string to display \n in printed output without trailing + leading '. Currently using repr on the input (opening + reading .txt) so I'm wondering what's the best way to remove the 's for proper indexing/search?

repr('s')[1:-1]
repr('s').strip("'")

Unfortunately some characters are being escaped which I don't want escaped using repr such as '.

JBallin
  • 8,481
  • 4
  • 46
  • 51
  • 4
    *"strip is deprecated"* citation needed. `strip` creates a copy of `s`, too. – vaultah May 19 '17 at 18:02
  • 2
    use `print(string)` or the string directly... I can't see a use for this. – TemporalWolf May 19 '17 at 18:02
  • 4
    `string.strip` is deprecated, but [`str.strip`](https://docs.python.org/2/library/stdtypes.html#str.strip) is not. – vaultah May 19 '17 at 18:04
  • There are almost no situations where it'd be a good idea to strip the leading and trailing quotes off of `repr`. Why do you want to do this? We can probably suggest a better way to go about whatever it is you're doing. – user2357112 May 19 '17 at 18:06
  • @vaultah Can you provide a link to `str.strip` in Python 2? – JBallin May 19 '17 at 18:07
  • @JBallin: [Here.](https://docs.python.org/2/library/stdtypes.html#str.strip) It's the method your code is already using, in fact. – user2357112 May 19 '17 at 18:08
  • Both `strip` and slicing will copy the data. – user2357112 May 19 '17 at 18:08
  • 2
    To be fair, str/string strip is understandably confusing. That being said, this is a red herring for the real issue: what problem does this solve for you? – TemporalWolf May 19 '17 at 18:09
  • @user2357112 Added use case (initially tried to keep simple). Removed deprecated strip. – JBallin May 19 '17 at 18:34
  • 1
    Don't call `repr` on the input. If you want to display output with `repr`-escaping, call `repr` when printing the output, but don't call it on the input. – user2357112 May 19 '17 at 18:50
  • @user2357112 Gotcha thanks! It's not as clean to add `repr` (vs repr + strip) to every `print` statement but I'm sure it's more "Pythonic"? – JBallin May 19 '17 at 19:23
  • Related question, [Escape special characters in a Python string - Stack Overflow](https://stackoverflow.com/questions/4202538/escape-special-characters-in-a-python-string) – user202729 Dec 23 '22 at 08:43

2 Answers2

4

You don't say why you would want to do this, but I'm guessing you plan to store the string in a database or a .csv file or something, and you don't want non-printing characters like linefeeds in your data because they can make other tools like SQL interpreters misbehave.

If this is a correct guess, don't use repr() because it will escape characters that I imagine you don't want escaped, like quotes and backslashes. Instead, decide what non-printing characters you want to quote (I think the only likely ones are \n and \t) and substitute them yourself.

fixed_s = s.replace("\n",r"\n").replace("\t",r"\t")

But if you are just using the string as an ordinary Python dictionary key, don't manipulate it, use it as-is.

BoarGules
  • 16,440
  • 2
  • 27
  • 44
  • You're totally right about only wanting to escape `\n`! Using `replace` seems so obvious now... Why `r"\n"` vs. `"\\n"`? – JBallin May 20 '17 at 23:39
  • 1
    Two answers. (1) I work a lot with regular expressions, and with Windows paths. So raw strings come to me much more naturally than `\` escapes. (2) In this case the use of a raw string emphasizes the intent of the replace(), because the only difference between the search substring and the replace substring is the rawness. – BoarGules May 21 '17 at 08:30
  • I realize this post is a few years old now but what if you need to use repr() on the entire string because you have a huge list of things that need to be escaped and you are not sure what they all are as you are parsing emails from outlook into python strings and then over to excel :D I need to know how to do what the OPs title is asking and not how to use replace. Got any ideas for that? – Mike - SMT Jun 19 '20 at 20:04
  • 1
    @Mike-SMT I take issue with *need to use `repr()`*. It isn't a magic wand. Its sole purpose is to render data in a form that is an acceptable Python constant (which is quite different from a useful Excel cell value). There is no substitute for knowing your data. If you realize that you don't (and that's not a criticism, it happens often with real-world data), your only way out is trial and error. Start with fixiing `\n` and `\r`, wait for that to fail, figure out why, and tweak. With constantly new data that can take months. The only defence is robust exception handling. – BoarGules Jun 19 '20 at 21:48
  • @BoarGules thats not the issue here. I am very aware of fixing basic escape issues like `\n`. We are running into many characters that are causing an `IllegalCharacterError` from openpyxl when appending email data to an excel file. The problem is openpyxl does not specify what actually the characters are so I am unable to build a comprehensive list of values to replace so right now only `repr()` has been able to bypass this problem. I am not saying its a good idea but right now its my only option until I can find all possible "Illegal Character's" that excel cant take in. – Mike - SMT Jun 19 '20 at 21:53
  • @BoarGules I have been digging into the files of openpyxl to try and figure out the list of characters and endued up hitting a wall after making some progress. I made a post about it here. [what-are-all-the-illegal-characters-from-openpyxl](https://stackoverflow.com/questions/62478974/what-are-all-the-illegal-characters-from-openpyxl) – Mike - SMT Jun 19 '20 at 21:54
  • @Mike-SMT Ah, that error message was missing from your problem statement. This is a total guess, but might the problem characters be emojis? They lie outside the Unicode Basic Multilingual Plane and, even though Excel can handle them, it would not surprise me if `openpyxl` can't. I don't know it nearly well enough to answer that, sorry. But a lot of Python-related software (for example IDLE) doesn't really support UTF-16, only its predecessor UCS-2, which is essentially UTF-16 but without characters outside the BMP. – BoarGules Jun 19 '20 at 22:11
  • @BoarGules I am getting close to my answer with research I have figured out the line that checks for illegal characters is using octals So now I just need to figure out how to compare octals to strings and I should be able to compensate for the problem. – Mike - SMT Jun 19 '20 at 22:18
  • I'm not sure I know what an *octal* is in this sense. Might it not be *octet*? That is what some European languages, and international standards, call a byte. – BoarGules Jun 20 '20 at 07:28
1

You're using repr() prematurely. It's meant to be used on output, not input.

Add repr(string) to your print statements and remove it from the input.

JBallin
  • 8,481
  • 4
  • 46
  • 51