41

"The average man does not want to be free. He simply wants to be safe." - H. L. Menken

I am attempting to write very secure C. Below I list some of the techniques I use and ask are they as secure as I think they are. Please don't not hesitate to tear my code/preconceptions to shreds. Any answer that finds even the most trivial vulnerability or teaches me a new idea will be highly valued.

Reading from a stream:

According to the GNU C Programming Tutorial getline:

The getline function will automatically enlarge the block of memory as needed, via the realloc function, so there is never a shortage of space -- one reason why getline is so safe. [..] Notice that getline can safely handle your line of input, no matter how long it is.

I assume that getline should, under all inputs, prevent a buffer overflow from occurring when reading from a stream.

  • Is my assumption correct? Are there inputs and/or allocation schemes under which this could lead to an exploit? For instance what if the first character from the stream is some bizarre control character, maybe 0x08 BACKSPACE (ctl-H).
  • Has any work been done to mathematically prove getline as secure?

Malloc Returns Null on Failure:

If malloc encounters an error malloc returns a NULL pointer. This presents a security risk since one can still apply pointer arithmetic to a NULL (0x0) pointer, thus wikipedia recommends

/* Allocate space for an array with ten elements of type int. */
int *ptr = (int*)malloc(10 * sizeof (int));
if (ptr == NULL) {
    /* Memory could not be allocated, the program should handle 
       the error here as appropriate. */
} 

Secure sscanf:

When using sscanf I've gotten in the habit of allocating the size to-be-extracted strings to the size of the input string hopefully avoiding the possibility of an overrun. For example:

const char *inputStr = "a01234b4567c";
const char *formatStr = "a%[0-9]b%[0-9]c":
char *str1[strlen(inputStr)];
char *str2[strlen(inputStr)];

sscanf(inputStr, formatStr, str1, str2);

Because str1 and str2 are the size of the inputStr and no more characters than strlen(inputStr) can be read from inputStr, it seems impossible, given all possible values for the inputStr to cause a buffer overflow?

  • Am I correct? Are there strange corner cases I haven't thought of?
  • Are there better ways to write this? Libraries that have already solved it?

General Questions:

While I've posted a large number of questions I don't expect anyone to answer all of them. The questions are more of guideline to the sorts of answers I am looking for. I really want to learn the secure C mindset.

  • What other secure C idioms are out there?
  • What corner cases do I need to always check?
  • How can I write unit tests to enforce these rules?
  • How can I enforce constraints in a testability or provably correct way?
  • Any recommended static/dynamic analysis technics or tools for C?
  • What secure C practices do you follow and how do you justify them to yourself and others?

Resources:

Many of the resources were borrowed from the answers.

Ethan Heilman
  • 16,347
  • 11
  • 61
  • 88
  • 11
    Maybe this should be community wiki, given the broad scope of the question... – R. Martinho Fernandes Jan 05 '10 at 18:31
  • 2
    1 note: Instead of calling `strlen()` twice, you should store it: `size_t len = strlen(inputStr); char *str1[len]; char *str2[len];` The compiler may do this for you, but it's not that hard to do yourself, is just as (if not more) readable, and guaranteed to be what you want. – Chris Lutz Jan 05 '10 at 18:39
  • 1
    "The average man does not want to be free. He simply wants to be safe." Orly, some architectures don't have pointers etc (all in the future), but you are still free to do anything on them :) – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Jan 05 '10 at 18:40
  • @Chris Lutz, that makes a lots of sense given very large strings, thanks. – Ethan Heilman Jan 05 '10 at 19:22
  • 1
    small tip, use when ever you don't want to change a string const char *, and even more important when you are not allowed to, i.e. in your example it should be read: const char *inputStr = "a01234b4567c"; – quinmars Jan 05 '10 at 22:28
  • I've actually forgotten why, but we were required to use const char * const for immutable character arrays. It prevents the pointer from being changed to point to a new location, but I don't recall what bug that was likely to prevent. – justin Jan 06 '10 at 03:25
  • Of coutse that should be `char str1[strlen(inputStr)];` (without the `*`) as it is an array of characters, not an array of pointers to characters. – Paul Ogilvie Feb 28 '18 at 09:41

7 Answers7

7

I think your sscanf example is wrong. It can still overflow when used that way.

Try this, which specifies the maximum number of bytes to read:

void main(int argc, char **argv)
{
  char buf[256];
  sscanf(argv[0], "%255s", &buf);
}

Take a look at this IBM dev article about protecting against buffer overflows.

In terms of testing, I would write a program that generates random strings of random length and feed them to your program, and make sure they are handled appropriately.

nont
  • 9,322
  • 7
  • 62
  • 82
  • 1
    Can you explain how it is possible to overflow? I avoided using a format string which specified the characters to read, that is %255s, because I am claiming my sscanf idiom is secure even if the format string incorrectly lists the characters to read. – Ethan Heilman Jan 05 '10 at 19:19
  • 1
    Even better would be `sscanf (argv[0], "%*s", sizeof buf - 1, buf);` to avoid hardcoding items depending on the size of the buffer. – wallyk Jan 05 '10 at 19:24
  • 1
    @wallyk does this work for more than one format character? For example does sscanf (argv[0], "%*[5-9]abcdefg%*[0-2]", sizeof str1 - 1, str1, sizeof str2, str2); work in a safe manner? What is the * doing? Deferencing s? – Ethan Heilman Jan 05 '10 at 19:42
  • 1
    Those work fine in questions, but not in comments. I don't even get a formating bar when writing a comment. – Ethan Heilman Jan 05 '10 at 19:53
  • 2
    Wrap your code with the grave accent character (same key as the tilde)... that will allow you to post inline code in your comment. – brianreavis Jan 05 '10 at 22:38
  • 3
    @wallyk `sscanf (argv[0], "%*s", sizeof buf - 1, buf);` does not work, in the context of a scan * is a assignment-suppressing token. See http://www.opengroup.org/onlinepubs/009695399/functions/scanf.html . Any idea how do what you intended without rewriting the string as posted here http://stackoverflow.com/questions/545018/how-to-pass-variable-length-width-specifier-in-sscanf – Ethan Heilman Jan 06 '10 at 03:54
4

A good place to start looking at this is David Wheeler's excellent secure coding site.

His free online book "Secure Programming for Linux and Unix HOWTO" is an excellent resource that is regularly updated.

You might also like to look at his excellent static analyser FlawFinder to get some further hints. But remember, no automated tool is a replacement for a good pair of experienced eyes, or as David so colourfully puts it..

Any static analysis tool, such as Flawfinder, is merely a tool. No tool can substitute for human thought! In short, "a fool with a tool is still a fool". It's a mistake to think that analysis tools (like flawfinder) are a substitute for security training and knowledge

I have personally used David's resources for several years now and find them to be excellent.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Rob Wells
  • 36,220
  • 13
  • 81
  • 146
4
  1. Reading from a stream

    The fact that getline() "will automatically enlarge the block of memory as needed" means that this could be used as a denial-of-service attack, as it would be trivial to generate an input that was so long it would exhaust the available memory for the process (or worse, the system!). Once an out-of-memory condition occurs, other vulnerabilities may also come into play. The behaviour of code in low/no memory is rarely nice, and very hard to predict. IMHO it is safer to set reasonable upper bounds on everything, especially in security-sensitive applications.

    Furthermore (as you anticipate by mentioning special characters), getline() only gives you a buffer; it does not make any guarantees about the contents of the buffer (as the safety is entirely application-dependent). So sanitising the input is still an essential part of processing and validating user data.

  2. sscanf

    I would tend to prefer to use a regular expression library, and have very narrowly defined regexps for user data, rather than use sscanf. This way you can perform a good deal of validation at the time of input.

  3. General comments

    • Fuzzing tools are available which generate random input (both valid and invalid) that can be used to test your input handling
    • Buffer management is critical: buffer overflows, underflows, out-of-memory
    • Race conditions can be exploited in otherwise secure code
    • Binary files could be manipulated to inject invalid values or oversized values into headers, so file format code must be rock-solid and not assume binary data is valid
    • Temporary files can often be a source of security issues, and must be carefully managed
    • Code injection can be used to replace system or runtime library calls with malicious versions
    • Plugins provide a huge vector for attack
    • As a general principle, I would suggest having clearly defined interfaces where user data (or any data from outside the application) is assumed invalid and hostile until it is processed, sanitised and validated, and the only way for user data to enter the application
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
gavinb
  • 19,278
  • 3
  • 45
  • 60
1

Insecure Programming by Example
blog with a few of the answers

Dustin Getz
  • 21,282
  • 15
  • 82
  • 131
  • As of 2019-12-28, both URLs are not current. The "Insecure Programming By Example" link was last tracked by The WayBack Machine in June 2013 — see http://web.archive.org/web/20130630235940/http://community.corest.com/~gera/InsecureProgramming/. The 'blog' site (still) says "the new site is http://morenops.com", but the new site is not available any more. The WayBack Machine last tracked it in June 2013 (again) — see http://web.archive.org/web/20130613023201/http://morenops.com/. – Jonathan Leffler Dec 28 '19 at 12:36
1

Yannick Moy developed a Hoare/Floyd weakest precondition system for C during his PhD and applied it to the CERT managed strings library. He found a number of bugs (see page 197 of his memoir). The good news is that the library is safer now for his work.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
1

You could also look at Les Hatton's web site here and at his book Safer C which you can get from Amazon.

Jackson
  • 5,627
  • 2
  • 29
  • 48
0

Don't use gets() for input, use fgets(). To use fgets(), if your buffer is automatically allocated (i.e., "on the stack"), then use this idiom:

char buf[N];
...
if (fgets(buf, sizeof buf, fp) != NULL)

This will keep working if you decide to change the size of buf. I prefer this form to:

#define N whatever
char buf[N];
if (fgets(buf, N, fp) != NULL)

because the first form uses buf to determine the second argument, and is clearer.


Check the return value of fclose().


Alok Singhal
  • 93,253
  • 21
  • 125
  • 158