2

As an exercise, we have to exploit insecure code by causing a buffer overflow by inputting to many characters into the console.

The solution requires both assembly code and literal bytes not directly related to instructions be entered. Is there a way to tell gcc that I want some literal value to be placed somewhere in the program?

I'm looking for something like

movl $123, %eax
.lit 0x11 0x22 0x33

Which would result in 0x11 0x22 0x33 being assembled into the output after the movl instruction's machine-code bytes.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Nearoo
  • 4,454
  • 3
  • 28
  • 39
  • Does AT&T support `db`? (define byte) – 500 - Internal Server Error Nov 15 '19 at 09:41
  • 1
    @500-InternalServerError: no, the pseudoinstruction is `.byte`. @ Nearoo: You could see that by compiling `char global = 123;` with gcc -S. – Peter Cordes Nov 15 '19 at 09:43
  • 1
    Thank you, that is exactly what I was looking for! – Nearoo Nov 15 '19 at 10:01
  • @PeterCordes Just a thought, I was unable to find an answer to my question by Googling the question. In that regard, my question is actually different from the one you referenced - one asks "what does .byte do", and my question ask "what is the directive for what .byte does". Someone who had the same question as me still wouldn't find an answer now, even though the answer to both question technically is the same. – Nearoo Nov 15 '19 at 10:02
  • 1
    @Nearoo The correct answer to both questions is: read the GNU assembler manual. Questions that can be quickly answered by reading the documentation are generally not too appreciated on this site. – fuz Nov 15 '19 at 10:12
  • 1
    Note that this is not at&t syntax, it is GAS syntax. Even in intel syntax mode you'd still use `.byte`. – Jester Nov 15 '19 at 11:37
  • assembly language is defined by the assembler, the tool, not by the target, there are countless x86 assemblers with different assembly languages, AT&T vs Intel are only a part of the differences in those languages, this question has nothing to do with AT&T vs intel. newer versions of gas for some targets you can use .inst and some versions for some targets require .inst vs .byte or .word (well at least if you want the disassembler to see it as not data). but .byte or .word or .hword or .dword or whatever is the quickest way. (as of this writing, gas is free to change at any time in the future) – old_timer Nov 15 '19 at 12:45
  • Honestly I'd argue that just because something is written in the documentation doesn't mean that it isn't worthy of a question. Otherwise, 99% of SO would be "not too appreciated". I find a good question is one that cannot easily be looked up by Googling related terms. I've looked through dozens of pages and questions about x86 and AT&T, looking for terms similar "x86 insert literal bytes", none contained the information I was looking for directly. Neither did the question you mentioned as duplicate, by the way, it doesn't ask what I'm looking for, hence I'm not reading the answer. – Nearoo Nov 15 '19 at 17:06
  • I honestly probably wouldn't have guessed that your question would've contained the answer to my question even if it appeared in my search results and I would've looked through it. I'm sure you can retrace that "reserve a byte in memory" isn't obviously what I'm looking for as a layman. – Nearoo Nov 15 '19 at 17:09
  • 2
    It's not a perfect duplicate, but the very title of the duplicate is the answer you were looking for, and is at least enough info to go look up `.byte` in the GAS manual. I had another look and found another question about how to emit literal data: [How to initialize variables, compile and run GNU assembly program](//stackoverflow.com/q/27098674). `.byte` works the same regardless of section. Also some others, including one where `.byte` is the answer. – Peter Cordes Nov 15 '19 at 18:09
  • 2
    @fuz I don't know if it's the case here, but a lot novice programmers don't realize that the `.byte` directive can can generate executable machine code. They assume it can only generate data and that the bytes it generated are somehow different than the bytes that assembly instructions generate. – Ross Ridge Nov 15 '19 at 18:39
  • @PeterCordes "How to initialize variables" are you serious right now? Did you just type `.byte` into the search box? Because apart from the `.byte` keyword being _present_ in the answer, _nothing_ directly relates to my question. The title is unrelated. The answer doesn't explain what `.byte` does. You even had to explain how the answer can be applied here! Just because the question you posted contains some pointers to the answer I'm looking for doesn't make my question superfluous. SO is about generating an index of useful questions and answers. It's not list of keywords. – Nearoo Nov 15 '19 at 21:03
  • @RossRidge Not related in any way. – Nearoo Nov 15 '19 at 21:04
  • @fuz Ok so I looked into the docs of "as", which, as I've learned just now, documents the compiler built into `gcc`. This is relevant because, as I've just learned, there are multiple layers of syntax to assembly code, one is AT&T related, the other gas. I've now searched the page for "byte", "padding", "literal" and "word", _none_ yielded the information I was looking for. Could you please tell me on what page I find what I need, and what strategy I should've used to find it? – Nearoo Nov 15 '19 at 21:04
  • @Nearoo: Yes, I did put `.byte` into google and SO's search to look for duplicates. Sorry about [Defining "variables" in assembly language](//stackoverflow.com/q/30559082), I didn't read it very carefully and you're right it wasn't useful at all. I removed that from the dup list. – Peter Cordes Nov 15 '19 at 21:08
  • 1
    I still think it's a duplicate because you already had the right idea with `.lit`, and just needed the right name for the pseudo-instruction. [What is the use of .byte assembler directive in gnu assembly?](//stackoverflow.com/q/7290318) (the first duplicate I used) does explicitly show `.byte` being mixed with instructions, not in a separate section. As for the GAS manual, (https://sourceware.org/binutils/docs/as/) searching for "byte" on that top-level table of contents finds the page for [`.byte`](https://sourceware.org/binutils/docs/as/Byte.html) – Peter Cordes Nov 15 '19 at 21:12
  • Or like I suggested in my first comment, compiling `char global = 123;` gets GCC to use that in its asm output: https://godbolt.org/z/VBU9w_ I agree there's a case to be made for reopening this question and wrapping a tiny bit of explanation around those links, especially at this point where people have already written explanations in comments. But the current top duplicate explicitly mentions using `.byte` to assemble machine-code bytes. Anyway, upvoted your question as a useful signpost to the other duplicates. – Peter Cordes Nov 15 '19 at 21:19
  • 1
    @PeterCordes No hard feelings, props for apologizing. I've seen the doc site for `.byte`, it says "Each expression is assembled into the next byte". I can now see how that presents a solution for what I'm looking for, however as a beginner I find that to be a cryptic sentence, and incredibly easy to miss. I stand by my opinion that I would've benefited from a question such as mine, and that others might too. Also, again being beginner, for me the questions marked as duplicate don't intuitively lead to an answer that contain information relevant to mine. So, thanks for upvoting. – Nearoo Nov 15 '19 at 21:33
  • 1
    @Nearoo The C compiler invokes the UNIX assembler `as` to assemble assembly files. The most common implementation of this program is GNU `as` (aka `gas`). If this was not known to you, I understand your frustration in finding the relevant documentation and I agree that the gcc team should document this better. The only hint I found is the “see also” section in the man page which references as(1), the assembler. – fuz Nov 16 '19 at 23:31
  • @fuz Thanks. Lets be for real though for a second, that name is horrific. `as` is about the worst name you can give to a tool. And documentation, too, don't seem to have aged past the times where HTML was made up of `

    ` and ``, and that's if your lucky - some tools use the manpages as documentation! Low level programming sure is hard, but I feel like a significant part of the reason for this the clusterf*uck that are unchangable codebases and standards from 1980 that are outdated as hell but you still have to deal with at every corner. /rant

    – Nearoo Nov 21 '19 at 14:08
  • @Nearoo `as` is how it was called since the first version of UNIX (from 1971). Stands for “assembler.” I don't see any problem with that name. – fuz Nov 21 '19 at 14:52

0 Answers0