-1

I have multiple files, each containing multiple strings like this one:

Species_name_ID:0.0000010229,

I need to find the string with a specific 'Species_name_ID', that I ask the user to provide, and do a simple replacement so that it now reads:

Species_name_ID:0.0000010229 #1,

I'm stuck at the first part, trying to look for the pattern. I've tried looking only for the numeric pattern at the end with this, and it returns a list of all the instances in which the pattern appears:

my_regex = r':0\.\d{10}'
for line in input_file:
        sp = re.findall(my_regex, line)
print(sp)

However, when I try adding the rest by using the string the user provides, it doesn't work and returns an empty list.

search = input("Insert the name of the species: ")
my_regex = f"{search}:0\.\d{{10}}"
for line in input_file:
        sp = re.findall(my_regex, line)
print(sp)

I've also tried the following syntax for defining the variable (all come from this previous question How to use a variable inside a regular expression?):

my_regex = f"{search}"
my_regex = f"{search}" + r':0\.\d{10}'
my_regex = search + r':0\.\d{10}'
my_regex = re.compile(re.escape(search) + r':0\.\d{10}')
my_regex = r'%s:0\.\d{10}'%search
my_regex = r"Drosophila_melanogaster_12215" + r':0\.\d{10}' 

Even when I try searching for the specified string, it doesn't find it in the file even when there are multiple hits it could make.

my_regex = Drosophila_melanogaster_12215

What am I missing?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • Do you realize you re-write `sp` upon each line with `for line in input_file: sp = re.findall(my_regex, line)`? – Wiktor Stribiżew Apr 11 '22 at 21:25
  • I am, but it should not matter. My files only have one line. When I try the code that works, I get a list of 19 numbers, which is what I have in the file. – Natalia GP Apr 11 '22 at 21:31
  • You need to use a raw f-string so that the backslashes will be preserved, for the same reason you used a raw string in the first code block. – Barmar Apr 11 '22 at 21:45
  • `my_regex = search + r':0\.\d{10}'` should work, though. – Barmar Apr 11 '22 at 21:47
  • `my_regex = re.escape(search) + r':0\.\d{10}'` cannot fail to work. – Wiktor Stribiżew Apr 11 '22 at 21:50
  • Do you mean, ```my_regex = f'{search}'```? It doesn't work either. – Natalia GP Apr 11 '22 at 21:50
  • No, I meant `my_regex = rf"{search}:0\.\d{{10}}"` – Barmar Apr 11 '22 at 21:50
  • 1
    The version with `re.escape()` is best and works: https://ideone.com/a0vPcJ – Barmar Apr 11 '22 at 21:51
  • Alright, what Barmar suggested worked! Thanks for helping me figure out that I was writing the syntax correctly. However, it still doesn't work if I use the ```input_file``` variable instead of ```line```. I think I can work it out from there. – Natalia GP Apr 11 '22 at 22:06

1 Answers1

0

This must work for you:

import re
search = input("Insert the name of the species: ")
my_regex = fr"{re.escape(search)}:0\.\d{{10}}"
for line in input_file:
    print( re.findall(my_regex, line) )

Escape the user-defined variable placed inside regular expressions.

Double curly braces if you want curly braces inside.

Use raw string literal for your regular expressions.

Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37