5

I would like to print ONLY the line which contains "Server" in the below piece of output:

Date: Sun, 16 Dec 2012 20:07:44 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
Set-Cookie: PREF=ID=da8d52b67e5c7522:FF=0:TM=1355688464:LM=1355688464:S=CrK5vV-qb3UgWUM1; expires=Tue, 16-Dec-2014 20:07:44 GMT; path=/; domain=.google.com
Set-Cookie: NID=67=nICkwXDM6H7TNQfHbo06FbvZhO61bzNmtOn4HA71ukaVDSgywlBjBkAR-gXCpMNo1TlYym-eYMUlMkCHVpj7bDRwiHT6jkr7z4dMrApDuTk_HuTrZrkoctKlS7lXjz9a; expires=Mon, 17-Jun-2013 20:07:44 GMT; path=/; domain=.google.com; HttpOnly
P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."
Server: gws
X-XSS-Protection: 1; mode=block
X-Frame-Options: SAMEORIGIN
Connection: close

This information is fetched from a list called websiteheaders. I have the below piece of code which is driving me crazy that it is not working properly...

for line in websiteheaders:
    if "Server" in line:
        print line

Now this above piece of code prints exactly the same block of text that is described at the beginning of my post. I just dont seem to get why it does that...

As I've said, I only want to print the line that contains "Server", if possible without regex. And if not possible, with regex.

Please help and thanks!

EDIT: My complete code so far is pasted here: http://pastebin.com/sYuZyvX9 EDIT2: For completeness, in hosts.txt there currently is 1 host named "google.com"

Update

My code was actually working fine, but there was a mistake in a other piece of my code which ensured that the data that was put into the list websiteheaders was 1 large string instead of multiple entries. In the above piece of code, it will ofcourse find "Server" and print the whole entry, which in my case was the full (large) string.

Using

websiteheaders.extend(headers.splitlines())

instead of

websiteheaders.append(headers)

did the trick for me. Thanks alot guys.

bryanvan
  • 85
  • 2
  • 2
  • 10
  • show us where you create websiteheaders... – Joran Beasley Dec 16 '12 at 20:29
  • What do you get if you `print len(websiteheaders)` right before the `for line in websiteheaders:` loop? Actually, what does `print repr(websiteheaders)` give? (You can edit the output into your question.) – DSM Dec 16 '12 at 20:35
  • @DSM It will output 1 actually. – bryanvan Dec 16 '12 at 20:37
  • 1
    @bryanvan: then that's the problem. Your `websiteheaders` is a list, but it's a list with one long string stored as its first element. So since `Server` is in that string, it passes and prints it. Instead of `websiteheaders.append(headers)`, use `websiteheaders.extend(headers.splitlines())`, so that you have a list containing each line. – DSM Dec 16 '12 at 20:38
  • @DSM Thank you this was indeed the correct answer. I always thought that append was the way to go. – bryanvan Dec 16 '12 at 20:45

2 Answers2

10

Is websiteheaders really a list which is split for very line? Because if it's a string you should use:

for line in websiteheaders.splitlines():
    if "Server" in line:
        print line

Also, a good tip: I would recommend adding some print-statements on encountering this kind of problems. If you would have added something like:

else:
    print 'WRONG LINE:', line

You probably would have catched that this loop was not looping over every line but over every character.

Update

I can't wee what's wrong with your code then. This is what I get:

In [3]: websiteheaders
Out[3]: 
['Date: Sun, 16 Dec 2012 20:07:44 GMT',
 'Expires: -1',
 'Cache-Control: private, max-age=0',
 'Content-Type: text/html; charset=ISO-8859-1',
 'Set-Cookie: PREF=ID=da8d52b67e5c7522:FF=0:TM=1355688464:LM=1355688464:S=CrK5vV-qb3UgWUM1; expires=Tue, 16-Dec-2014 20:07:44 GMT; path=/; domain=.google.com',
 'Set-Cookie: NID=67=nICkwXDM6H7TNQfHbo06FbvZhO61bzNmtOn4HA71ukaVDSgywlBjBkAR-gXCpMNo1TlYym-eYMUlMkCHVpj7bDRwiHT6jkr7z4dMrApDuTk_HuTrZrkoctKlS7lXjz9a; expires=Mon, 17-Jun-2013 20:07:44 GMT; path=/; domain=.google.com; HttpOnly',
 'P3P: CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."',
 'Server: gws',
 'X-XSS-Protection: 1; mode=block',
 'X-Frame-Options: SAMEORIGIN',
 'Connection: close"']

In [4]: for line in websiteheaders:
   ...:     if 'Server' in line:
   ...:         print line
   ...:         
Server: gws
Niclas Nilsson
  • 5,691
  • 3
  • 30
  • 43
  • 1
    +1 although I would recommend splitlines unless you are sure of endline encoding ... – Joran Beasley Dec 16 '12 at 20:21
  • Yes, websiteheaders is a list. If I implement above code I get the following error: AttributeError: 'list' object has no attribute 'splitlines' – bryanvan Dec 16 '12 at 20:26
  • I can convert the list to a str for example : text = str(websiteheaders) and implement the above piece of code, but then it will just print a complete string... – bryanvan Dec 16 '12 at 20:29
  • @bryanvan: Well the right approach would be `text = '\n'.join(websiteheaders)` But don't bother. The problem is somewhere else. I've updated my answer and for me your code is working fine. Is your list looking the same as mine (see above)? – Niclas Nilsson Dec 16 '12 at 20:32
  • I've pasted my code so far into my original post. I am quite curious as to what could be wrong! Thx so far. – bryanvan Dec 16 '12 at 20:33
  • @bryanvan add a `print websiteheaders` just before the loop and update your question with your list. – Niclas Nilsson Dec 16 '12 at 20:37
  • @NiclasNilsson I think you mislooked my code, but had the right idea. I indeed needed to place something in my code to let the list get the data as lines and not as 1 long string. DSM give me the correct answer, I will update my post. Thanks alot!! – bryanvan Dec 16 '12 at 20:45
  • Great! alot of print statments help when debugging. – Niclas Nilsson Dec 16 '12 at 20:48
2
for single_line in websiteheaders.splitlines():
    if `Server` in single_line:
        print single_line
Timothy
  • 4,467
  • 5
  • 28
  • 51