What I intend to do ?
To perform a search for list of alphabetic string among a set of files on Windows File System (around 25K numbers of varying sizes and extensions primarily flat text files, biggest file being not more than few MB in size)
What I did to achieve this?
for each_file in files:
file_read_handle = open(each_file,"rb")
file_read_handle.seek(0) #ensure you're at the start of the file
first_char = file_read_handle.read(1) #get the first character
if first_char:
file_read_content_mappd = mmap.mmap(file_read_handle.fileno(), 0, access=mmap.ACCESS_READ)
if re.search(br'(?i)T_0008X_WEB', file_read_content_mappd):
file_write_content = ('Text T_0008X_WEB found in {}'.format(each_file))
file_write_handle.write(file_write_content)
file_write_handle.write("\n")
file_write_handle.close()
This piece of code works just fine for hardcoded text search (see line T_0008X_WEB) among files that are opened in binary mode ("rb") to avoid UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 776: character maps to undefined error.
However, when trying to search a list of values by replacing the hardcoded value with a variable like this-if re.search('br\'(?i)' + regex_search_str_byte + '\'', file_read_content_mappd):
, have been facing following issues-
- When used:
re.search('br\'(?i)' + regex_search_str + '\'', file_read_content_mappd):
got error: File is in binary and search text is in string type - When used:
re.search(regex_search_str_byte, file_read_content_mappd):
got issue: No match was found because even the regex characters br'(?i) were also considered as part of byte converted search text
Request guidance on how to perform byte converted text regex search for a list of values, on binary mode opened file read?