You can use a set:
- Sets can only contain unique values
- I've used
set(...)
to be explicit, but set(...)
can be replace with {...}
.
- This implementation builds a generator inside
set()
- Don't use a list-comprehension inside (e.g.
set([...])
), because the list can potentially use a lot of memory.
- word not in tickets causes
NameError: name 'tickets' is not defined
because, from the perspective of the list comprehension, tickets
does not exist.
- If you're not getting a
NameError
, it's because tickets
exists in memory already, or tickets
is assigned in your code, but not this example.
- Given the example code, if you clear the environment, and run the code, you'll get an error.
.match
returns something like <re.Match object; span=(0, 9), match='PRJ1-2333'>
or None
- Where
match = jira_regex.match(t)
, if there's a match, get the value with match[0]
.
word for line in f for word in line.split() if jira_regex.match(word)
assumes that if jira_regex.match(word)
isn't None
that the match is always equal to word
. Based on the sample data, this is the case, but I don't know if that's the case with the real data.
jira_regex = re.compile("^[A-Z][A-Z0-9]+-[0-9]+")
with open('test.txt', 'r') as f:
tickets = set(word for line in f for word in line.split() if jira_regex.match(word))
print(tickets)
{'MISC-5002', 'PRJ1-2333', 'PRJ2-2333'}
Without .split()
:
- It seems as if
line.split()
is being used to get rid of the newline, which can be accomplished with line.strip()
Option 1:
jira_regex = re.compile("^[A-Z][A-Z0-9]+-[0-9]+")
with open('test.txt', 'r') as f:
tickets = set(jira_regex.match(word.strip())[0] for word in f) # assumes .match will never be None
print(tickets)
{'MISC-5002', 'PRJ1-2333', 'PRJ2-2333'}
Option 2:
jira_regex = re.compile("^[A-Z][A-Z0-9]+-[0-9]+")
with open('test.txt', 'r') as f:
tickets = set(word.strip() for word in f if jira_regex.match(word.strip()))
print(tickets)
{'MISC-5002', 'PRJ1-2333', 'PRJ2-2333'}
For the code to be explicit:
jira_regex = re.compile("^[A-Z][A-Z0-9]+-[0-9]+")
tickets = list()
with open('test.txt', 'r') as f:
for t in f:
t = t.strip() # remove space from beginning and end and remove newlines
match = jira_regex.match(t) # assign .match to a variable
if match != None: # check if a match was found
match = match[0] # extract the match value, depending on the data, this may not be the same as 't'
if match not in tickets: # check if match is in tickets
tickets.append(match) # if match is not in tickets, add it to tickets
print(tickets)
['PRJ1-2333', 'PRJ2-2333', 'MISC-5002']