0

In the program I am running, ESI+0A4 would typically represent a whole number somewhere between 0 and 200 roughly. The current formula would look something like this:

1 - (0.0028 x [ESI+0A4]) = XXXXX

If the ESI value was say, 50 then the result would be 0.86

How do I modify the below so the formula looks like this instead?:

1-((0.0028x[ESI+0A4])^2)
FMUL DWORD PTR DS:[ESI+5C]
FSTP DWORD PTR DS:[ESI+5C]
FSAVE DS:[0A0D0A0]
FLD DWORD PTR DS:[0A0D454]
FLD DWORD PTR DS:[ESI+0A4]
FMUL ST,ST(1)
FLD DWORD PTR DS:[0A0D450]
FSUB ST,ST(1)
FSTP DWORD PTR DS:[ESI+5C]
FRSTOR DS:[0A0D0A0]
JMP 004AB486

Thanks to Peter's advice, I will revise it so that it shows the following:

FMUL DWORD PTR DS:[ESI+5C]
FSTP DWORD PTR DS:[ESI+5C]
FLD DWORD PTR DS:[0A0D454]
FLD DWORD PTR DS:[ESI+0A4]
FMUL ST,ST(1)
FMUL ST,ST
FLD DWORD PTR DS:[0A0D450]
FSUB ST,ST(1)
FSTP DWORD PTR DS:[ESI+5C]
JMP 004AB486
  • You took out the square brackets, so you want to take the address as an integer as well? Also, it looks like you're not replacing a multiply, you're adding another operation (squaring) afterward. Changing a mul to squaring would be easy, just use the same operand twice like `fmul st0` (and fill unneeded instruction bytes with NOPs or dummy prefixes like `ds`). – Peter Cordes Jul 18 '21 at 07:59
  • 1
    Also, don't post pictures of text. Copy/paste the disassembly (or source code) into a code-formatting block. Are you modifying an existing binary (so it has to fit in the same size of machine-code bytes), or are you just showing disassembly instead of your asm source for some other reason? – Peter Cordes Jul 18 '21 at 08:02
  • It's super weird (and quite slow) to use `fsave` / `frstor` around a few operations. Are you sure you need to do that? If you balance your fld and pop operations properly, you probably don't need that. All of those instructions could use memory sources. – Peter Cordes Jul 18 '21 at 08:05
  • Hi Peter, I am working off of someone else's code. I am not sure why he used the fsave, or even 'Wait' functions. So are you saying to do something like this? `FLD DWORD PTR DS:[0A0D454] ////////// FLD DWORD PTR DS:[ESI+0A4] ////////// FMUL ST,ST(1) ////////// FMUL ST(0) ////////// FLD DWORD PTR:DS[0A0D450]` Sorry for the confusion in removing the square brackets, changing that was not the intention. – x87-Learner Jul 18 '21 at 08:31
  • Ok, well I'm guessing they probably used fsave/frstor because they didn't know what they were doing, as evidenced by the inefficient code between those two instructions which fails to take advantage of memory source operands. So anyway, you're adding a squaring operation on top of everything else, so optimize the code to save space and make room for an `fmul st, st`. That's literally all you need to add, before `fld1` / `fsubrp`. – Peter Cordes Jul 18 '21 at 08:47
  • @PeterCordes I'm trying to edit it but Ollydbg is throwing a wobbly. Here are the HEX codes, do you think you could help me edit them? Potentially with and without the save/store functions. `D84E5C` `D95E5C` `9BDD3500000000` `D90500000000` `D986A4000000` `D8C9` `D90500000000` `D8E1` `D9465C` `D8C9` `D95E5C` `DD2500000000` `E900000000` – x87-Learner Jul 18 '21 at 09:24
  • Oh, so you don't have the source for this "someone else's code" you're working from? If you do, you should edit that instead and rebuild. Otherwise yeah work out some more compact instructions like I suggested to save 2 bytes for an `fmul st, st`, and edit the binary. I have no idea what you mean by "throwing a wobbly". – Peter Cordes Jul 18 '21 at 09:34
  • No, I'm using an XML patcher to do rewrite code and do address relocations. Actually that might be why Olly doesn't like what I am doing...Let me see if I can fiddle with this. – x87-Learner Jul 18 '21 at 09:43
  • Well the answer to the question you asked, how to change the *assembly* code, is to add an `fmul st,st`. If you have other requirements like patching a binary (potentially without changing the size of this block of machine code), say so in the question. And copy/paste some text already to replace that image. – Peter Cordes Jul 18 '21 at 09:46
  • There was some unused space so I was able to add the two bytes just fine. Now there are two FMULs one after the other like this just before the next `FLD`: `fmul st,st(1)`//// `fmul st, st` – x87-Learner Jul 18 '21 at 10:12
  • The code in the question now makes no sense and doesn't match the image; it looks like you replaced `fld` with `fmul` when copy/pasting from the disassembly. If you typed it by hand, use a disassembler that lets you copy/paste. – Peter Cordes Jul 18 '21 at 21:48
  • I have corrected it now, sorry. – x87-Learner Jul 19 '21 at 02:49
  • I just noticed that the instructions before fsave are loading and storing to `[ESI+5C]`, the same location this code eventually stores to. So that store (and the computation leading up to it) is dead and can be skipped. It also means there's definitely a free x87 register, and that's all this computation needs (if optimized properly to not do extra fld), so it's safe to skip that fsave/frstor unless some other code somewhere else also reads the save space. You should only need fld of 0.0028 / fmul mem / fmul st / fsubr mem (1.0) / fstp. (Or like I said, fld1 / fsubrp), and some long nops. – Peter Cordes Jul 19 '21 at 03:04
  • I'll give that a go, thanks a lot! – x87-Learner Jul 19 '21 at 10:29
  • Post your answer as an *answer*, not an edit to the question. Your current answer is broken: you removed the fsave/frstor, but didn't fix the code there to keep the x87 stack balanced (or optimize it to use memory-source fmul / fsubr). You have more fld (3) than pop (1) operations, so you leave 2 x87 stack registers in use after your fstp. After this code runs a few times, the x87 stack will be full and `fld` will produce a NaN. Single-step in a debugger and look at the x87 state before/after. – Peter Cordes Jul 19 '21 at 19:31
  • @PeterCordes Unfortunately I am below beginner level at this so until you point out these errors, I have no idea. I know the suggestion might be to read some books or take a course but there is simply no time in my schedule and this is not for my job (or any job). There is an opportunity for me to at least understand the basics if someone could hold my hand through each step of the process and show me exactly what is required and why. Practically everything else I am doing in x87 is roughly at this level of complexity so once I have a grasp of it I should be okay. – x87-Learner Jul 21 '21 at 03:59
  • Get a debugger that can show you the x87 register usage as you single step. This is an *essential* tool for learning more asm, and should be enough (along with http://www.ray.masmcode.com/tutorial/fpuchap1.htm basics of the mostly-obsolete legacy x87 FPU) to show you that this sequence of instructions leaves some extra x87 registers in use. Read the rest of the tutorial, and/or look at some x87 compiler output from any decent compiler with optimization enabled, so see how to use it while keeping the stack balanced for a function that returns a value, for example. – Peter Cordes Jul 21 '21 at 04:11
  • [How to remove "noise" from GCC/clang assembly output?](https://stackoverflow.com/q/38552116) – Peter Cordes Jul 21 '21 at 04:13
  • @PeterCordes So, if I understand correctly, that first link is saying that, every time I do an FLD, then the value I loaded goes to the top of the stack right? So, if I want to multiply two numbers together, then I would (in very simple terms) do an FLD [ValueA], followed by FLD [ValueB]. Then FMUL ST(0),ST(1)? If so, does the result of the FMUL calculation then go to the top of the stack, with the original ValueA and ValueB becoming ST(1) and ST(2)? – x87-Learner Jul 21 '21 at 12:24
  • No, `fmul st(0), st(1)` does `st0 *= st1` replacing st0, not allocating a new register, so that's pointlessly inefficient vs. `fld [valueA]` / `fmul [valueB]`. You could see that in a debugger when you single-step so you can stop guessing, and/or read the manual https://www.felixcloutier.com/x86/fmul:fmulp:fimul. If you *did* already have 2 values on the stack, `fmulp` would do `st1 *= st0` and then pop, effectively replacing the top 2 values with 1 value holding the product. – Peter Cordes Jul 21 '21 at 16:51
  • Oh. So in other words, it does the calculation but then replaces the ST0 value, rather than moving anything down the stack? Do you have any recommendation for single step debuggers I can use in Windows which are happy to deal with this kind of code? – x87-Learner Jul 21 '21 at 16:59
  • Right. I use GDB on Linux. I've heard that Ollydbg can be useful for Windows binaries without debug info, although I think it's not maintained anymore. I don't know how nice its x87 register window display looks, but I assume it has some support and will be much better than nothing. – Peter Cordes Jul 21 '21 at 17:08
  • Well, that's already incredibly helpful information, so thank you. Back to the original point about balancing the code - after the JMP, [ESI+5C] is used again for additional calculations. Does the FMUL function load [ESI+5C] into the stack? or is it assuming the value is already loaded? – x87-Learner Jul 21 '21 at 17:28
  • `fmul [mem]` does `st0 *= [mem]`, exactly like with a register source. It doesn't separately push a copy of the memory operand onto the x87 stack. Read the manual entry, https://www.felixcloutier.com/x86/fmul:fmulp:fimul, it documents exactly what the instruction does. Notice that unlike `fld`, there's no pushing, only a possible pop if you `fmulp`. If the manual isn't clear, try it yourself. I'm not interested in answering questions that a debugger could answer for you; go install one and start using it. If something is still unclear after that, then ask. – Peter Cordes Jul 21 '21 at 17:32
  • In Japanese the phrase 全然大丈夫 means "That is absolutely fine". 全然 is typically only used in combination with a negative phrase, but has been adopted as a positive in recent years. I am guessing you don't read Kanji so these are just squiggles to you. Equally I don't read x87 code very well. I will try to work with what you gave me, but it might be far easier for you to spell it out plainly. – x87-Learner Jul 21 '21 at 18:42
  • The difference is I'm not trying to learn Kanji. You for some reason do want to learn obsolete x87 instead of easier to use modern SSE/SSE2 scalar instructions, so I'm pointing you at resources for teaching yourself, and ways to build on the explanations of single instructions that I already did spell out. Once you understand how the register stack works, it pretty much works the way other asm instructions do. – Peter Cordes Jul 21 '21 at 18:46
  • Yes, I'm working on a modification for a game, the only way we can do it is via XML patches which target the assembly code directly. I guess the point is that it all looked good until you said it was unbalanced. Now I'm not sure if it is unbalanced because I added an extra line, or because I took those two lines away, and I don't know what function I need to add to balance it back out again. – x87-Learner Jul 28 '21 at 16:00
  • You can single-step it with a debugger and look at how many x87 registers are in use at the start of your block vs. at the end. That should be the same total number (probably 0). Once you understand this concept, and how fld is a push while other operations can include a pop, you can verify the balancing for yourself, and make sense of my previous comments that suggested a specific sequence of instructions that would keep the x87 stack balanced. – Peter Cordes Jul 28 '21 at 21:35

0 Answers0