0

I want to extract the substring "login attempt [b'admin'/b'admin']" from the string:

2021-05-06T00:00:15.921179Z [HoneyPotSSHTransport,1127,5.188.87.53] login attempt [b'admin'/b'admin'] succeeded.

But python returns the whole string. My code is:

import re
hand = open('cowrie.log')
outF = open("Usernames.txt", "w")
for line in hand:
    if re.findall(r'login\sattempt\s\[[a-zA-z0-9]\'[a-zA-z0-9]+\'/[a-zA-z0-9]+\'[a-zA-z0-9]+\'\]', line):
        print(line)
        outF.write(line)
        outF.write("\n")
outF.close()

Thanks in advance. This is the LINK which contains the data from which I want to extract.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

1 Answers1

2

Your code states: if re.findall returns something, print the whole line. But you should print the return from re.findall and write that as a string.

Or use re.search if you expect a single match.

Note that [A-z] matches more than [A-Za-z].

import re

hand = open('cowrie.log')
outF = open("Usernames.txt", "w")
for line in hand:
    res = re.search(r"login\sattempt\s\[[a-zA-Z0-9]'[a-zA-Z0-9]+'/[a-zA-Z0-9]+'[a-zA-Z0-9]+']", line)
    if res:
        outF.write(res.group())
        outF.write("\n")
outF.close()

Usernames.txt now contains:

login attempt [b'admin'/b'admin']
The fourth bird
  • 154,723
  • 16
  • 55
  • 70