-3

I am looking to match a pattern such as

(u'-<21 characters>', N),

21 character of 0-9, a-z, A-Z plus characters like ~!@#$%^&*()_ ... N is a number from 1 to 99

I am trying to find the specific way to retrieve the 21 characters as well as the number N and use them later on using the re.match method but I do not know how and the documentation is not understandable. How do I do so?

Zap
  • 325
  • 1
  • 6
  • 23
  • possible duplicate https://stackoverflow.com/questions/19300020/python-match-a-string-with-regex – DrBwts May 30 '17 at 15:39
  • Your question is not really clear... What is the data you have at the moment. Should the pattern you describe be the input or the output of your function ? – LoicM May 30 '17 at 15:41
  • 1
    @DrBwts - I don't think that's a good dup -- This post is asking about groups, and that post doesn't involve groups in any way. – Robᵩ May 30 '17 at 15:42
  • Does your data string *only* consist of 21 characters and a single number separated by a comma? Or is that pattern embedded inside a larger string? Could you supply example input data and expected output please? – cdarke May 30 '17 at 15:42
  • @LoicM I have a .txt file filled with such lines ((u'-<21 characters>', N),\n) I wish for each line to retrieve the 21 characters as well as the Number, so it'd be the output. – Zap May 30 '17 at 15:43
  • @cdarke e.g. (part of the file) `(u'--UE_y6auTgq3FXlvUMkbw', 10), (u'--XBxRlD92RaV6TyUnP8Ow', 1), (u'--sSW-WY3vyASh_eVPGUAw', 2), (u'-0GkcDiIgVm0XzDZC8RFOg', 9), (u'-0OlcD1Ngv3yHXZE6KDlnw', 1), (u'-0QBrNvhrPQCaeo7mTo0zQ', 1),` – Zap May 30 '17 at 15:44
  • 1
    So a simple split around a comma could suffice? – cdarke May 30 '17 at 15:44
  • @cdarke I have 1 comma after the string of 21 chars and 1 comma after each line, before the '\n'. Plus I need to get rid of the `(u'` part as well as the closing parenthesis in the end of each line. – Zap May 30 '17 at 15:45
  • The data you just posted, is that one line or is each parentheses group on a separate line? – cdarke May 30 '17 at 15:45
  • I did say possible duplicate, to be honest it was difficult to work out what was being asked – DrBwts May 30 '17 at 15:51
  • You say "21 character of 0-9, a-z, A-Z ". The line `u'--UE_y6auTgq3FXlvUMkbw'`, after the `--` it is only 20 chars, and it includes an `_`, so should that not match? Also, `u'--sSW-WY3vyASh_eVPGUAw'` includes a `-` and a `_`, so should that not match either? – cdarke May 30 '17 at 16:01
  • @cdarke no, each data is on a different line; the copying-pasting must have ruined that. Also yes, edited to be fixed; it can have other characters as well. – Zap May 30 '17 at 17:15
  • 1
    @DrBwts the other question, despite the title, is **not about regex at all**. I just finished closing it as a duplicate of other things. – Karl Knechtel Aug 08 '22 at 02:43

1 Answers1

1

Here is one program that might do what you want.

Note the use of parentheses () to isolate the data you are looking for. Note also the use of m.group(1), m.group(2) to retrieve those saved items.

Note also the use of re.search() instead of re.match(). re.match() must match the data from the very beginning of the string. re.search(), on the other hand, will find the first match, regardless of its location in the string. (But also consider using re.findall(), if a string might have multiple matches.).

Don't be confused by my use of .splitlines(), it is just for the sake of the sample program. You could equally well do data = open('foo.txt') / for line in data:.

import re

data = '''
(u'--UE_y6auTgq3FXlvUMkbw', 10),
(u'--XBxRlD92RaV6TyUnP8Ow', 1),
(u'--sSW-WY3vyASh_eVPGUAw', 2),
(u'-0GkcDiIgVm0XzDZC8RFOg', 9),
(u'-0OlcD1Ngv3yHXZE6KDlnw', 1),
(u'-0QBrNvhrPQCaeo7mTo0zQ', 1)
'''
data = data.splitlines()

for line in data:
    m = re.search(r"'(.+)', (\d+)", line)
    if m:
        chars = m.group(1)
        N = int(m.group(2))
        print("I found a match!: {}, {}".format(chars, N))
Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • This does print each line, thank you. I can use chars and N to store the pairs each time right? – Zap May 30 '17 at 17:18
  • Yes, `chars` and `N` will hold, for each iteration of the loop, the values you want. If you want all of the different `chars, N` pairs, you might want to create a list to hold them all. – Robᵩ May 30 '17 at 17:40