replace hexadecimal with decimal in multiple locations within text document

Question

I have a rather large text document and would like to replace all instances of hexadecimals inside with regular decimals. Or if possible convert them into text surrounded by '' e.g. 'I01A' instead of $49303141
The hexadecimals are currently marked by starting with $ but I can ctrl+F change that into 0x if that helps, and I need the program to detect the end of the number since some are short $A, while others are long like $568B1F How could I do this with python, or is it not possible?

Thank you for the help thus far, hoping to clarify my request a bit more to hopefully get a complete solution. I used a version of Grismar's answer and the output it gives me is

"if not (GetItemTypeId(GetSoldItem())==I0KB) then
set int1= 2+($3E8*3)"

However, I would like to add the ' around the newly created text and convert hex strings smaller then 8 to decimals instead so the output becomes

"if not (GetItemTypeId(GetSoldItem())=='I0KB') then
set int1= 2+(1000*3)"
Hoping for some more help tog et the rest of the way.
def hex2dec(s):
return int(s,16)
was my attempt to convert the shorter hexadecimals to decimal but clearly has not worked, throws syntax errors instead.
Also, I will manually deal with the few $ not used to denote a hexadecimal.


# just creating an example file
with open('D:\Deprotect\wc3\mpq editor\Work\\new 4.txt', 'w') as f:
    f.write('if not (GetItemTypeId(GetSoldItem())==$49304B42) then\n')
    f.write('set int1= 2+($3E8*3)\n')


def hex_match_to_string(m):
    return ''.join([chr(int(m.group(1)[i:i+2], 16)) for i in range(0, len(m.group(1)), 2)])

def hex2dec(s):
    return int(s,16)


# open the file for reading
with open('D:\Deprotect\wc3\mpq editor\Work\\new 4.txt', 'r') as file_in:
    # open the same file again for reading and writing
    with open('D:\Deprotect\wc3\mpq editor\Work\\new 4.txt', 'r+') as file_out:
        # start writing at the start of the existing file, overwriting the contents
        file_out.seek(0)
        while True:
            line = file_in.readline()
            if line == '':
                # end of file
                break
            # replace the parts of the string matching the regex
            line = re.sub(r'\$((?:\w\w\w\w\w\w\w\w)+)', hex_match_to_string, line)
            #line = re.sub(r'$\w+', hex2dec,line)
            file_out.write(line)
        # the resulting file is shorter, truncate it from the current position
        file_out.truncate()

Worth noting that, apparently, in your input file anything marked with a `$` is followed by hex values between 0 and 255 (two positions)? Also, what happens if there is a `$` somewhere else in your file (for example, some piece of text where it says "I was making big $$$ answering questions on SO")? Or does that never happen for the files you are converting? — Grismar, Dec 23 '19 at 04:38
updated the request with more information and current progress from first 2 answers — afis, Dec 23 '19 at 18:54
The idea is not to update the question with the answer though... You should have just accepted the answer and asked a new question (like "How do I add quotes around the result of a function returning a string" or something like that). Because now, your question is unclear and the remaining answer trivial. — Grismar, Dec 25 '19 at 00:50
Sorry about that, next time I will just make it a new question then. btw, is there a line limit to how big a file this can work on? when I tried on main file it only did part of the work — afis, Dec 25 '19 at 15:12
File size shouldn't limit it, but if the assumptions about the format are wrong, it may fail at some specific point and you may need to write to a separate file. — Grismar, Jan 01 '20 at 02:30

FredMan · Answer 1 · 2019-12-23T14:05:42.297

See the answer https://stackoverflow.com/a/12597709/1780027 for how to use re.sub to replace specific content of a string with the output of a function. Using this you could presumably use the "int("FFFF", 16) " code snippet you're talking about to perform the action you desire.

EG:

   >>> def replace(match):
   ...    match = match.group(1)
   ...    return str(int(match, 16))

   >>> sample = "here's a hex $49303141  and there's a nother 1034B and another $8FD0B"
   >>> re.sub(r'\$([a-fA-F0-9]+)', replace, sample)
   "here's a hex 1227895105  and there's a nother 41803 and another 589067"

Grismar · Answer 2 · 2019-12-23T05:31:55.010

Since you are replacing parts of the file with something that's shorter, you can write to the same file you're reading. But keep in mind that, if you were replacing those parts with something that was longer, you would need to write the result to a new file and replace the old file with the new file once you were done.

Also, from your description, it appears you are reading a text file, which makes reading the file line by line the easiest, but if your file was some sort of binary file, using re wouldn't be as convenient and you'd probably need a different solution.

Finally, your question doesn't mention whether $ might also appear elsewhere in the text file (not just in front of pairs of characters that should be read as hexadecimal numbers). This answer assumes $ only appears in front of strings of 2-character hexadecimal numbers.

Here's a solution:

import re

# just creating an example file
with open('test.txt', 'w') as f:
    f.write('example line $49303141\n')
    f.write('$49303141 example line, with more $49303141\n')
    f.write('\n')
    f.write('just some text\n')


def hex_match_to_string(m):
    return ''.join([chr(int(m.group(1)[i:i+2], 16)) for i in range(0, len(m.group(1)), 2)])


# open the file for reading
with open('test.txt', 'r') as file_in:
    # open the same file again for reading and writing
    with open('test.txt', 'r+') as file_out:
        # start writing at the start of the existing file, overwriting the contents
        file_out.seek(0)
        while True:
            line = file_in.readline()
            if line == '':
                # end of file
                break
            # replace the parts of the string matching the regex
            line = re.sub(r'\$((?:\w\w)+)', hex_match_to_string, line)
            file_out.write(line)
        # the resulting file is shorter, truncate it from the current position
        file_out.truncate()

The regex is simple r'\$((?:\w\w)+)', which matches any string starting with an actual $ (the backslash avoids it being interpreted as 'the beginning of the string') and followed by 1 or more (+) pairs of letters and numbers (\w\w).

The function hex_match_to_string(m) expects a regex match object and loops over pairs of characters in the first matched group. Each pair is turned into its decimal value by interpreting it as a hexadecimal string (int(pair, 16)) and that decimal value is then turned into a character with that ASCII value (chr(value)). All the resulting characters are joined into a single string (''.join(list)).

A different way or writing hex_match_to_string(m):

def hex_match_to_string(m):
    hex_nums = iter(m.group(1))
    return ''.join([chr(int(a, 16) * 16 + int(b, 16)) for a, b in zip(hex_nums, hex_nums)])

This may perform a bit better, since it avoids manipulating strings, but it does the same thing.

How do I get the return to include ' ' eg to replace $49304B42 with 'I0KB' instead of just I0KB ? — afis, Dec 23 '19 at 23:34
Ugly is `'\''` + + `'\''`, I'd prefer f"'{}'" (i.e. double quotes on the outside). You can also give up a tiny bit of performance by first assigning the return value to a variable (i.e. `result = `) and then returning `return f'\'{result}\'', which is what I would do to keep the code readable. — Grismar, Dec 25 '19 at 00:49

replace hexadecimal with decimal in multiple locations within text document

2 Answers2