0

I have a long string that contains many \n escape sequences that are supposed to be new line characters. To properly write this string as it were meant to be read, I thought it would be best to split the string based on the \n character and then write each string in the resulting list individually to achieve the desired effect. However, this doesn't work, it is just not splitting them correctly. Below is my code and just to be clear I have tried both \n and \n as splits because I am trying to split at a literal \n in the string. Thanks for any help.

shellreturn = subprocess.check_output(["C:\Python34\python",root.wgetdir + "\html2text.py", keyworddir + "\\" + item])
print(shellreturn)
shelllist = (str(shellreturn).split("\\n"))
mgilson
  • 300,191
  • 65
  • 633
  • 696
Kyle
  • 2,339
  • 10
  • 33
  • 67
  • What's `shellreturn` then? – vaultah May 08 '14 at 06:18
  • @frostnational, here is just a small piece of shellreturn: b"**PMSI Direct** \n262 Old New Brunswick Rd., Unit M \nPiscataway, NJ 08854 \n800.238.1316 \n\nName\n\nCompany\n\nPhone\n\nEmail* _(required)_\n\n![PMSI Direct, Inc. – Kyle May 08 '14 at 06:20
  • `it is just not splitting them correctly` how is it splitting it then? – Tim May 08 '14 at 06:21
  • I'm not sure, with frostnational's help it seems to have removed the \n's now in the output but it is all still one line. – Kyle May 08 '14 at 06:32

2 Answers2

6

You have bytes and not str here. Decode it to string like

shellreturn = shellreturn.decode()

or

shellreturn = str(shellreturn, 'utf-8')

After it's decoded you can use .split('\n') or .splitlines().

vaultah
  • 44,105
  • 12
  • 114
  • 143
  • Unfortunately, this still leaves everything as one line. – Kyle May 08 '14 at 06:17
  • @Kyle that's clear now. updated. – vaultah May 08 '14 at 06:22
  • thanks for the update. I made the changes you suggested. Oddly, it did properly remove all the \n's but did not allow them to be written on seperate lines. – Kyle May 08 '14 at 06:26
  • @Kyle I'm not sure I understand you, show the output. – vaultah May 08 '14 at 06:28
  • The output is the same as the input except with \n's remove. Maybe my printing method is bad: for i in range(len(shelllist)): text_file.write(str(shelllist[i])) – Kyle May 08 '14 at 06:30
  • 1
    @Kyle it is. I your case I'd use just `text_file.writelines(shelllist)` or `for i in range(len(shelllist)): text_file.write(str(shelllist[i])+'\n')` – vaultah May 08 '14 at 06:35
  • @Kyle take a look at [this](http://stackoverflow.com/a/12377541/2301450) – vaultah May 08 '14 at 06:36
  • Ok thank you, the second one with the loop seems to have done the trick. – Kyle May 08 '14 at 06:51
0
shellreturn = subprocess.check_output(["C:\Python34\python",root.wgetdir + "\html2text.py", keyworddir + "\\" + item])
print(shellreturn)
shelllist = (str(shellreturn).split("\\n"))

The argument to subprocess.check_output is begging for trouble by not properly escaping \ and not using os.path.join, but that's not what the question is about. You did escape the \ in "\\" as well as "\\n". Let's look at the sample data and what would happen to it:

b"PMSI Direct \n262 Old New Brunswick Rd., Unit M \nPisca..."

The b" mark shows that this is bytes in a Python literal syntax. That means the \ escape sequences are escape sequences, unlike in raw strings (r prefix). So the line delimiter here is "\n", not "\\n". If you split that for "\\n", it will not find any, so you get the original string as the only item in a list. This is a correct split when the delimiter is not found.

An additional complication is that you appear to be running on Windows, where '\n' is not the OS format for newline. They use '\r\n', which Python usually handles behind the scenes when you open text files, so the way you open text_file also matters.

Yann Vernier
  • 15,414
  • 2
  • 28
  • 26