1

I have a text file consisting of URL per line as follows:

https://www.google.com
https://www.facebook.com
https://www.gmail.com

I use the following script:

import requests

add = open("manual_list.txt","r")

for a in add:
  response = requests.get(a, timeout=(2, 5), verify=False)
  fout = open("mylist.txt","a")
  fout.write(response.url+"\n")
  fout.close()

The problem is, when I write the resulting URL to a file, I get additional %0A at the end of each line. Can you please explain to me why is this happening?

The problem could be solved by adding strip function to the input:

response = requests.get(add.strip(), timeout=(2, 5), verify=False)

My questions:

1) I can not understand why this is needed?

2) Searching about %0A, it turns a line feeding character. This is different from new line character. Can you explain how is it added? Is it my list's fault or the library?

I used the same list with other programs and I don't seem to have similar problem. Why is it problematic here? is it the library's fault? or the list's fault?

EDIT: I use Ubuntu 18.04 and python 3.6.5

user9371654
  • 2,160
  • 16
  • 45
  • 78
  • The convention for the web is for a line ending to be *two* characters, `\r\n`. Many OS use the same convention, e.g. Windows - *nix is the odd one out. – Mark Ransom Mar 02 '19 at 12:47
  • @Mark Ransom Sorry I do not get what you mean. How this is related to my question? Or solve the problem? – user9371654 Mar 02 '19 at 12:58
  • @Mark Ransom If you run the code in your device, do you get this additional char at the end of each line in the output file? Why do I get it? Is the input file is faulty? – user9371654 Mar 02 '19 at 13:00
  • `for a in add` is going to read the file line by line *including the end-of-line characters* and store each line in `a` as it's read. If you don't want the character, then you have to strip it off. `%0A` *is* the newline character on unix-style systems. Most unix systems use line feed, Windows systems use a combination of carriage return and line feed `%0D%0A`. Hope that helps! And no, it's not your fault. – John Szakmeister Mar 02 '19 at 13:11
  • Sorry, I had a brain fart. `%0A` is `\n` and `%0D` would be `\r`, except you didn't have one of those. – Mark Ransom Mar 02 '19 at 16:53

2 Answers2

1
requests.get(add, timeout=(2, 5), verify=False)

should probably be

requests.get(a, timeout=(2, 5), verify=False)

Can you try again with that change?

EDIT:

with open("url_list.txt","r") as f:
    content = f.readlines()
print(content)

will print out

['https://www.google.com\n', 'https://www.facebook.com\n', 'https://www.gmail.com\n']

Here you can see that your lines in your file do have a '\n', this is normal It just tells the program where a new line should begin. That's why you need an .strip()

Georges Lorré
  • 443
  • 3
  • 11
  • That was a typo. Even after fixing it. It runs, but the output file has %0A at the end of each line. As I said, this is a line feed character, and the problem is resolved if I add `a.strip()` instead `a` in the `request.get`, but why? I do not see this character in the input file. – user9371654 Mar 02 '19 at 12:53
0

for a in add is going to read the file line by line including the end-of-line characters and store each line in a as it's read. If you don't want the character, then you have to strip it off.

%0A is the "newline" character on unix-style systems (but it is called the "line feed" character). Windows systems use a combination of carriage return and line feed (%0D%0A).

Hope that helps! And no, it's not your fault.

John Szakmeister
  • 44,691
  • 9
  • 89
  • 79
  • Is it the for loop? so this is the case in any program not related to requests? – user9371654 Mar 02 '19 at 13:34
  • I made a small test. Just read lines from a file using for loop, and print the line in output file. I do not see that character. But apparently, there is an extra line between the lines. I do not understand why this is different behavior than the requests program. – user9371654 Mar 02 '19 at 13:38
  • Use `print(repr(a))` and you'll see it. :-) – John Szakmeister Mar 02 '19 at 17:12