0

The C program is a Damereau-Levenshtein algorithm that uses a matrix to compare two strings. On the fourth line of main(), I want to malloc() the memory for the matrix (2d array). In testing, I malloc'd (0) and it still runs perfectly. It seems that whatever I put in malloc(), the program still works. Why is this?

I compiled the code with the "cl" command in the Visual Studio developer command prompt, and got no errors.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>


int main(){

    char y[] = "felkjfdsalkjfdsalkjfdsa;lkj";
    char x[] = "lknewvds;lklkjgdsalk";
    int xl = strlen(x);
    int yl = strlen(y);
    int** t = malloc(0);
    int *data = t + yl + 1; //to fill the new arrays with pointers to arrays
    for(int i=0;i<yl+1;i++){
        t[i] = data + i * (xl+1); //fills array with pointer
    }
    for(int i=0;i<yl+1;i++){
        for(int j=0;j<xl+1;j++){
            t[i][j] = 0; //nulls the whole array
        }
    }

    printf("%s", "\nDistance: ");
    printf("%i", distance(y, x, t, xl, yl));
    for(int i=0; i<yl+1;i++){
        for(int j=0;j<xl+1;j++){
            if(j==0){
                printf("\n");
                printf("%s", "| ");
            }
            printf("%i", t[i][j]);
            printf("%s", " | ");
        }
    }


}
int distance(char* y, char* x, int** t, int xl, int yl){
    int isSub;
    for(int i=1; i<yl+1;i++){
        t[i][0] = i;
    }
    for(int j=1; j<xl+1;j++){
        t[0][j] = j;
    }



    for(int i=1; i<yl+1;i++){
        for(int j=1; j<xl+1;j++){
            if(*(y+(i-1)) == *(x+(j-1))){
                isSub = 0;

            }
            else{
                isSub = 1;

            }
            t[i][j] = minimum(t[i-1][j]+1, t[i][j-1]+1, t[i-1][j-1]+isSub); //kooks left, above, and diagonal topleft for minimum
            if((*(y+(i-1)) == *(x+(i-2))) && (*(y+(i-2)) == *(x+(i-1)))){ //looks at neighbor characters, if equal

                t[i][j] = minimum(t[i][j], t[i-2][j-2]+1, 9999999); //since minimum needs 3 args, i include a large number
            }



        }
    }


    return t[yl][xl];
}

int minimum(int a, int b, int c){ 
    if(a < b){
        if(a < c){
            return a;
        }
        if(c < a){
            return c;
        }
        return a;
    }
    if(b < a){
        if(b < c){
            return b;
        }
        if(c < b){
            return c;
        }
        return b;
    }
    if(a==b){
        if(a < c){
            return a;
        }
        if(c < a){
            return c;
        }

    }
}
Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • 2
    C has no bounds-checking at all. Not even for arrays (for which the compiler actually have the size). Going out of bounds leads to *undefined behavior*. – Some programmer dude Sep 16 '16 at 15:25
  • 1
    You're likely invoking undefined behavior. You can overflow a buffer provided to you in C - that doesn't mean it's a good idea to do so. It's up to you to ensure you're allocating enough memory, never writing beyond its bounds, as well as deallocating it. – Random Davis Sep 16 '16 at 15:25
  • 1
    Also note that it's up to the implementation of [`malloc`](http://en.cppreference.com/w/c/memory/malloc) if it should return `NULL` or a valid pointer if you pass `0` as the size. – Some programmer dude Sep 16 '16 at 15:26
  • Possible duplicate of [C - malloc and arrays confusion](http://stackoverflow.com/questions/11551472/c-malloc-and-arrays-confusion) – Random Davis Sep 16 '16 at 15:27
  • "It works" - so you think. Your program is *definitely* invoking undefined behavior; anything you observe as a result, and attempt to pin any form of "sense" to, is a naive endeavor. Truth be told, in actuality it is *unfortunate* that it appears "work" by your perspective. Had it crashed and burned, the smoking wreck would have been a tangible indicator that something is wrong; far better than lulling you into a false sense of correctness. – WhozCraig Sep 16 '16 at 15:41

4 Answers4

3

Regarding malloc(0) part:

From the man page of malloc(),

The malloc() function allocates size bytes and returns a pointer to the allocated memory. The memory is not initialized. If size is 0, then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free().

So, the returned pointer is either NULL or a pointer which can only be pasxed to free(), you cannot expect to dereference that pointer and store something into the memory location.

In either of the above cases, you're trying to to use a pointer which is invalid, it invokes undefined behavior.

Once a program hits UB, the output of that cannot be justified anyway.

One of the major outcome of UB is "working fine" (as "wrongly" expected), too.

That said, follwing the analogy

"you can allocate a zero-sized allocation, you just must not dereference it"

some of the memory debugger applications hints that usage of malloc(0) is potentially unsafe and red-zones the statements including a call to malloc(0).

Here's a nice reference related to the topic, if you're interested.

Regarding malloc(<any_size>) part:

In general, accessing out of bound memory is UB, again. If you happen to access outside the allocated memory region, you'll invoke UB anyways, and the result you speculate cannot be defined.

FWIW, C itself does not impose/ perform any boundary checking on it's own. So, you're not "restricted" (read as "compiler error") from accessing out of bound memory, but doing so invokes UB.

Community
  • 1
  • 1
Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
0

It seems that whatever I put in malloc(), the program still works. Why is this?

int** t = malloc(0);
int *data = t + yl + 1;

t + yl + 1 is undefined behavior (UB). Rest of code does not matter.

If t == NULL, adding 1 to it is UB as adding 1 to a null pointer is invalid pointer math.

If t != NULL, adding 1 to it is UB as adding 1 to that pointer is more than 1 beyond the allocating space.


With UB, the pointer math may worked as hope as typical malloc() allocates larges chunks, not necessarily the small size requested. It may crash on another platform/machine or another day or phase of the moon. The code is not reliable even if it works with light testing.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
0

You just got lucky. C does not do rigorous bounds checking because it has a performance cost. Think of a C program as a raucous party happening in a private building, where the OS police are stationed outside. If somebody throws a rock that stays inside the club (an example of an invalid write that violates the ownership convention within the process but stays within the club boundaries) the police don't see it happening and take no action. But if the rock is thrown and it goes flying dangerously out the window (an example of a violation that is noticed by the operating system) the OS police step in and shut the party down.

NovaDenizen
  • 5,089
  • 14
  • 28
0

The C standard says:

If the size of the space requested is zero, the behavior is implementation-defined; the value returned shall be either a null pointer or a unique pointer. [7.10.3]

So we have to check what your implementation says. The question says "Visual Studio," so let's check Visual C++'s page for malloc:

If size is 0, malloc allocates a zero-length item in the heap and returns a valid pointer to that item.

So, with Visual C++, we know that you're going to get a valid pointer rather than a null pointer.

But it's just a pointer to a zero-length item, so there's not really anything safe you can do with that pointer except pass it to free. If you dereference the pointer, the code is allowed to do anything it wants. That's what's meant by "undefined behavior" in the language standards.

So why does it appear to work? Probably because malloc returned a pointer to at least a few bytes of valid memory since the easiest way for malloc to give you a valid pointer to a zero-length item is to pretend you really asked for at least one byte. And then the alignment rules would round that up to something like 8 bytes.

When you dereference the beginning of that allocation, you likely have some valid memory. What you're doing is strictly illegal, non-portable, but, with this implementation, likely to work. When you index farther into it, you'll likely start corrupting other data structures (or metadata) in the heap. If you index even father into it, you're increasingly likely to crash due to hitting an unmapped page.


Why does the standard allow malloc(0) to be implementation-defined instead of just requiring it to return a null pointer?

With pointers, it's sometimes hand to have special values. The most obvious being the null pointer. The null pointer is just a reserved address that will never be used for valid memory. But what if you wanted another special pointer value that had some meaning to your program?

In the dark days before the standard, some mallocs allowed you to effectively reserve additional special pointer values by calling malloc(0). They could have used malloc(1) or any other very small size, but malloc(0) made it clear that you just wanted to reserve and address rather than actual space. So there were many programs that depended on this behavior.

Meanwhile, there were programs that expected malloc(0) to return a null pointer, since that's what their library had always done. When the standards people looked at the existing code and how it used the library, they decided they couldn't choose one method over the other without "breaking" some of the code out there. So they allowed malloc's behavior to remain "implementation-defined."

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175