1

I have created this code to have a user point at a directory and for it to go through the directory looking for .xml files. Once found the program is supposed to search each file looking for strings that are 32 bits in length. This is the only requirement, the content is not important at this time just that it return 32 bit strings.

i have tried using the regex module within Python as below, when run the program iterates over the available files. returns all the file names but the String_recovery function returns only empty lists. I have confirmed that the xml contains 32 bit strings visually.

import os
import re
import tkinter as tk
from tkinter import filedialog



def string_recovery(data):
    short_string = re.compile(r"^[a-zA-Z0-9\-._]{32}$")
    strings = re.findall(short_string, data)
    print(strings)


def xml_search(directory):
    xml_files = []
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith(".xml"):
                xml_files.append(os.path.join(root, file))
    print("The following XML files have been found.")
    print(xml_files)

    for xml_file in xml_files:
        with open(xml_file, "r") as f:
            string_recovery(f.read())


def key_finder():
    directory = filedialog.askdirectory()
    xml_search(directory)


key_finder()
tripleee
  • 175,061
  • 34
  • 275
  • 318
LHenne300
  • 15
  • 5
  • 1
    What does your "32 bit string" look like? What does your XML-file look like? – MSpiller Feb 01 '23 at 14:04
  • Welcome to Stack Overflow. I cannot understand the question, because the length of a string **is not measured in** bits. Also, the function does not `return` at all (please read [What is the purpose of the return statement? How is it different from printing?](/questions/7129285/)), and the only list involved there is `xml_files`. – Karl Knechtel Feb 01 '23 at 14:09
  • Is the `m` flag the default? I don't think it is in which case `^` and `$` are the start and end of the file not a line. Maybe try adding the `m` flag to your parrern. – JonSG Feb 01 '23 at 14:19
  • 1
    I rolled back your latest edit. You have accepted an answer; ask a new question if you have a new question. Perhaps provide a link back to this question if it's useful for background. (Clicking the title will get you a URL you can share.) – tripleee Feb 02 '23 at 13:55

2 Answers2

0

Maybe you should go over each line:

    for xml_file in xml_files:
        with open(xml_file, "r") as f:
            string_recovery(f.read())

If your string_recovery works properly (try it with a line, I cannot reproduce your example but create a variable line = and put there a line which should be recoverd.

And go over each line instead of the whole file:

    for xml_file in xml_files:
        with open(xml_file, "r") as f:
            for line in f.readliens():
                string_recovery(line)
3dSpatialUser
  • 2,034
  • 1
  • 9
  • 18
0

By default, python patterns are not "multiline" thus ^ and $ match the start and end of your text block, not each line. You need to set this flag re.M aka re.MULTILINE:

compare:

import re

text = """
foo
12345678901234567890123456789011
12345678901234567890123456789011
"""
pattern = r"^[a-zA-Z0-9\-._]{32}$"
print(re.findall(pattern, text, re.M))  ## <--- flag

Giving:

[
    '12345678901234567890123456789011',
    '12345678901234567890123456789011'
]

with:

import re

text = """
foo
12345678901234567890123456789011
12345678901234567890123456789011
"""
pattern = r"^[a-zA-Z0-9\-._]{32}$"
print(re.findall(pattern, text))

Giving:

[]
JonSG
  • 10,542
  • 2
  • 25
  • 36