1

If I have a text file name is python.txt

Python 2.0 was released on 16 October 2000 with many major new features, including a cycle-detecting garbage collector and support for Unicode.[43]

Python 3.0 was released on 3 December 2008. It was a major revision of the language that is not completely backward-compatible.[44] Many of its major features were backported to Python 2.6.x[45] and 2.7.x version series. Releases of Python 3 include the 2to3 utility, which automates (at least partially) the translation of Python 2 code to Python 3.[46]

I want to Find all instances of the python versions that has with .x in it. It should capture the followings: 3.5.x, 2.6.x, 2.7.x, etc. Then, print the result and the length of that list. How can I do? Thanks in advance. should I import this txt file at first? my codes are below:

import re
fp = open('python.txt', 'r')
s = fp.readline()
#print(s)
aList = re.findall('([-+]?\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?',s) 
#print(aList)
for ss in aList:
    #print(ss[0]+ss[2])
    aNum = float((ss[0]+ss[2]))
    #print(type(aNum))
    print(aNum)
fp.close()
ssssmaner
  • 65
  • 5

2 Answers2

2

You're pretty close. Your regular expression is wrong, and you're not reading in the file properly.

See here for the official file reading/writing tutorial in Python's docs.

Secondly, [ ] won't match two digits in a row in a regular expression. That's an empty character class, and so it won't match anything. What you're looking for is just \d\d: \d represents, "any digit," and you want two in a row. Simple, right? :)

Here's an example:

import re

with open('python.txt', 'r') as f:
  string = f.read()

for match in re.finditer(r'\d\d', string):
  print(f"Start: {match.start()}, End: {match.end()}, Group: {match.group()}")

If the format you're looking for is specifically digits that are surrounded by square brackets, you'll need to use this regex instead: \[\d\d\]. You need to precede your square brackets with backslaches, since, as mentioned in that character-class-link above, [] means something special in regex syntax. The slahes tell the regex that they are the two characters, [ and ], literally.

In that case, your new loop becomes:

for match in re.finditer(r'\[\d\d\]', string):

PS: The r before the string in the code tells Python to interpret my typed characters literally. By default, a backslash inside a string in Python will escape whatever character comes after it. We don't want that; we want to actually include a backslash in our expression string. So, instead of doing string_w_slash = '\\', we can use string_w_slash = r'\'. Using r for regular expressions is very handy. These are called raw strings, if you want to look more into them.

matthew-e-brown
  • 2,837
  • 1
  • 10
  • 29
  • 2
    Maybe OP needs 2 digits that came in format like `[12]` – kuro Apr 30 '21 at 05:35
  • @kuro Good point. I've added a paragraph at the bottom. – matthew-e-brown Apr 30 '21 at 05:37
  • for match in re.finditer(r'[d]d', string): right? – ssssmaner Apr 30 '21 at 05:39
  • @ssssmaner The backslashes are important: `[d]d` is a regex that will search for "any character that is: `d`" followed by "exactly `d`". Take a look at that character class link, and then also take a peek [here](https://www.regular-expressions.info/shorthand.html) for an explanation of `\d`. – matthew-e-brown Apr 30 '21 at 05:41
0

Here is a sample code to achieve this. Considering python.txt as input.

import re
def get_nums(x):
    regex = "(\[(\d{2})\])"
    l = []
    search = re.findall(regex, x)
    if search is not None:
        for x in search:
            #print(x[1])
            l.append(x[1])
    return l


def read_file(file_name):
    num_list = []
    with open(file_name, 'r') as reader:
        for line in reader:
            x = get_nums(line)
            if x is not None:
                num_list.extend(x)
    print(num_list)


read_file('python.txt')

Output :

['28', '29']
vijaydeep
  • 395
  • 3
  • 12