Is it legal to treat a pointer like an array?

Question

void uc(char* s)
{
    int i;

    for( i=0; i < strlen(s); i++ )
        if (97 <= s[i] && s[i] <= 122)
            s[i] = s[i] - 32;

    return;
}

My professor showed our class this operator.

char* s copies an array, this is ok because an array name is it's first element memory address.

Now my problem is: why do we treat the pointer s as an array in the for cycle?
Pointers store addresses, but I learnt they don't have a very intuitive behaviour...

My problem is that I consider them "as an int variable", since memory address are integers in an hexadecimal format (right?), but i know it's not so simple.

Edit: thank you all for the answers, I'm loving this site and community <3 As you witnessed I'm a newbie, so thank you for the patience and the nice explanations

Hm. `char *s` is not an operator, but a definition. If you have a char array `char name[] = "Giuseppe"` and then call `uc(name)`, then the array will be represented by a pointer to its first element, but nothing is copied. If you now change `s[i]` or `*s`, you modify the original array _via_ the pointer `s`. — M Oehm, Apr 18 '20 at 10:58
@MOehm, I meant uc as an operator (function), anyway, from what you wrote you mean that *s is defined as "an array of addresses", with the same dimension of the array it represents? Doesn't `s` store just one address? How does the compiler know what `s[i]` is? — Giuseppe Reitano, Apr 18 '20 at 11:10
It looks normal with some recommendations. First - `strlen(s)` now is calculated on each iteration. You can assign it to variable before `for` loop e.g. `int len = strlen(s)` and then use `len` or reverse the loop `for ( i = strlen(s) - 1; i >= 0; i-- )`. Second - equalize the comparisons `if ( s[i] >= 97 && s[i] <= 122 )` (put variable and constant on fixed place). — i486, Apr 18 '20 at 11:14
Does this answer your question? [What's a modern term for "array/pointer equivalence"?](https://stackoverflow.com/questions/48868367/whats-a-modern-term-for-array-pointer-equivalence) — bitmask, Apr 18 '20 at 11:15
I suggest you read chapter 5 (_Pointers and Arrays_) of the book [The C Programming Language](https://www.amazon.com/Programming-Language-2nd-Brian-Kernighan/dp/0131103628) 2nd edition by Kernighan and Ritchie. — Abra, Apr 18 '20 at 11:16
Yes, `s` stores one address, but because you know that there is an array of chars with a null char at the end, you can access the array via that address. The layout in memory is so that the next character is at the following address and so on. (Of course, you could also pass the address of a single char or `NULL`, in which case your function would fail. Unfortunately, pointers in C are used for many different things.) — M Oehm, Apr 18 '20 at 11:29
I suggest reading section 6 of the [comp.lang.c faq](http://c-faq.com/). — pmg, Apr 18 '20 at 11:36

score 2 · Accepted Answer · edited Apr 19 '20 at 03:50

First things first, and being completely blunt:

Your mental model is wrong! It is imperative, that you're correcting your misconceptions now, before you're in too deep.

char* s copies an array,

This is a misconception. s is a pointer to a char. It could be a single char or a whole array. The exact type of the underlying object is lost when taking an address.

Nothing is copied, though! It's just a pointer to "wherever" (waves around with arms) and everyone involved (you, the compiler, other programmers) are in an unspoken and unwritten agreement to be nice and not doing something stupid. Like passing in a pointer that later in the function will be used in an invalid way.

this is ok because an array name is it's first element memory address.

Arrays don't have names! Symbols do. The symbol to an array will decay to a pointer to the elementary type of which the array is made from. This decay is why you can write char somearray[123]; char *p = somearray without taking its address.

why do we treat the pointer s as an array in the for cycle?

Because we can. More specifically because of this thing called "pointer arithmetic". The expession s + 1 will result in a pointer that points one element past the address of the element the pointer is pointing to. It works for any number (within the value range of ptrdiff_t).

When you write a_pointer[i] in C, it literally translates (that's not hyperbole, the C standard requires it to be treated by the compiler being done like that!) into *(a_pointer + i). So what happens is that by writing a_pointer[i] you're telling the compiler: *"assume that a_pointer points into an array object and that a_pointer + i is still inside the bounds of that array object: With that assumption, dereference that location and produce the value there."

However the results of pointer arithmetic are defined only, if the resulting pointer stays within the bounds of an object.

Do pointer arithmetic on a pointer that's not taken from an array? Undefined!

Generate a pointer that's outside the bounds of an array? Undefined!

My problem is that I consider them "as an int variable",

They're not! Technically pointers may be implemented by unicorn dust and magic. There are a few very specific rules to them, when it comes to intermingling them with numbers. In the C programming language these rules are (simplified):

Pointers can be translated into integers of size sizeof(uintptr_t) and vice versa.
The numeric value 0 translates to the null pointer, and null pointers translate to the numeric value 0.
Null pointers are invalid and hence must not be dereferenced.
Pointers can be subtracted from each other, resulting in an integer compatible to ptrdiff_t, and the value of the resulting integer is the distance in elements between these two pointers, assuming that both pointers refer to the same object. Written in "types" ⟪ptrdiff_t⟫ = ⟪pointer A⟫ - ⟪pointer B⟫, only arithmetic valid rearrangements of this are valid.
You can't add pointers
You can't multiply pointers
There is no mandate that number representations of pointers can be used for pointer arithmetic. I.e. you must not assume that (pointer_A - pointer_B) == k*((uintptr_t)pointer_A - (uintptr_t)pointer_B)) for any value of k.

since memory address are integers in an hexadecimal format (right?),

Huh?!? This is not how things work.

Yes, you can use integers to address memory location. No, you don't have to write them as hexadecimals. Hexadecimal is just a different number base and 0xF == 15 = 0o17 == 0b1111. These days we usually write addresses in hexadecimal because it nicely aligns with our current computer architectures' word sizes being powers of 2. One hexadecimal digit equals 4 bits. But there are other architectures that use different word sizes and on those other number bases are better suited.

And that still assumes linear address spaces. There are however also computer architectures that support segmented address spaces. As a matter of fact, it is very likely that the machine you're reading this on is such a computer. If it's using a CPU made by Intel or AMD, this thing actually understands segmented addresses https://en.wikipedia.org/wiki/X86_memory_segmentation

In x86 segmented address space an address actually consists of two numbers, i.e. it forms a vector. Which means if you're compiling a C program to run in a segmented address space environment pointer types no longer will be simple singular value numbers. C still requires them to be translatable to uintptr_t though, ponder on that!

Thank you so much, now I got a "better" (since what I thought wasn't even close to what things really are) picture of what all of this means. Nice and clear explanation — Giuseppe Reitano, Apr 18 '20 at 14:49

Hitokiri · Answer 2 · 2020-04-18T11:07:46.513

1

s is a pointer so we can use it as an array if it is allocated.

Two options below are similar:

s[i] = s[i] - 32;

and

*(s+i) = *(s+i) -32

since memory address are integers in an hexadecimal format (right?)

No, the hexadecimal format is used by user for showing the address of memory. If you use binary number to describe the address of memory, it's so long.

edited Apr 18 '20 at 11:07

answered Apr 18 '20 at 10:59

Hitokiri

3,607
1
9
29

So what the calculator does behind the scenes is augmenting the value of the address stored in the pointer variable `s`? – Giuseppe Reitano Apr 18 '20 at 11:14
`*(s+i)` is the value of the address `s+i`. So `*(s+i) -32` is subtraction between two values – Hitokiri Apr 18 '20 at 11:18

score 1 · Answer 3 · answered Apr 18 '20 at 11:34

char* s copies an array - no, it doesn't.

The argument to this function is a pointer-to char. That's it. The dereference syntax for a pointer can take two forms: *(p + n), and p[n]. The two forms are equivalent. in both cases, the address in p is taken by value, adjusted using the stride of the element type, and the resulting address is then dereferenced for either reading or storage depending on the context of usage.

Your function can be written in a much more pointer-obvious way, and as a bonus, avoid the strlen call with each iteration (which can be expensive)

void uc(char* s)
{
    for (; *s; ++s)
    {
        if (97 <= *s && *s <= 122)
            *s -= 32;
    }
}

This walks the char sequence originating at the input address held by s until such time as *s (which is advanced with each iteration in the loop using ++s) equates to the terminating nullchar (zero-octet). Because we're advancing s with each iteration, it always sits on the character being processed for that iteration.

Like everything else in C, function arguments are passed by value. It just happens that the "value" of an array id, when used in an expression context (almost everywhere), is the base address of its first element. This, therefore, produces the availability of mutating the data referred from that address.

Therefore:

#include <stdio.h> // for puts

void uc(char* s)
{
    for (; *s; ++s)
    {
        if (97 <= *s && *s <= 122)
            *s -= 32;
    }
}

int main()
{
    char s[] = "lower";
    uc(s);
    puts(s);
    return 0;
}

will print LOWER on an ascii-compliant platform. I implore you to run the above code in a debugger, taking note the following:

The base address of s[] in main()
The value of s in the argument list for uc when you initially step into it.
What happens to s in uc as the loop iterates
The value of *s when used in the various contexts it appears in uc

That's honestly about the best I can do in explaining it. Best of luck.

Thank you for the answer, so the null char `\0` in this case stops the loop right? So `\0` is useful also for other functions besides the ones in `string.h`. Sorry for the dumb questions but I'm new to coding. When you say take note of the base address of `s[]`, you mean: print the address and see how it changes at the end of the cycle? I feel dumb rn but I'm new to this and I'm trying to learn :p — Giuseppe Reitano, Apr 18 '20 at 14:46
```#include // for puts void uc(char* s) { for (; *s; ++s) { printf("\nWhat's inside s[i] (before): %c", *s); if (97 <= *s && *s <= 122) *s -= 32; printf("\nWhat's inside s[i] (after): %c", *s); printf("\nAddress of s[i]: %p\n", s); } } int main() { char s[] = "lower"; printf("Address of s in main : %p\n", s); uc(s); puts(s); return 0; }``` ok cool, so every time the cycle runs, the address of `s` increases by 1 (which in hexadecimal is a->b->c->... assuming the last digit is a) — Giuseppe Reitano, Apr 18 '20 at 15:18
it's a mess, i tried formmatting it better but i guess in the comment section is a bit complicated... — Giuseppe Reitano, Apr 18 '20 at 15:20

John Bode · Answer 4 · 2020-04-18T18:10:26.543

Except when it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" (T [N]) is converted ("decays") to an expression of type "pointer to T" (T *) and the value of the expression is the address of the first element of the array.

Array objects are not pointers. If you declare an array like

char foo[] = "hello";

it will look like this in memory (addresses are for illustration only):

        +–––+
0x1000: |'h'|
        +–––+
0x1001: |'e'|
        +–––+
0x1002: |'l'|
        +–––+
0x1003: |'l'|
        +–––+
0x1004: |'o'|
        +–––+
0x1005: | 0 |          
        +–––+

The object foo is not a pointer; it doesn’t set aside any space for a pointer. The expression foo is converted to a pointer under most circumstances, including when passed as a function argument:

uc( foo );

What uc receives is the address of the first element, hence the declaration

void uc( char *s ) { ... }

As for the subscript [] operator, it’s the same thing - the array expression is converted to a pointer to the first element, and the subscript operation is applied to that pointer. The subscript operation is defined as

a[i] == *(a + i)

Given a starting address a, compute the address of the i'th object of the pointed-to type (not the i'th byte) following that address and dereference the result.

So the upshot of that is yes, you can use the [] subscript operator on a pointer expression as well as an array expression.

Pointers don’t have to be represented as integers - on some older segmented architectures, they were represented as a pair of values (page number and offset). Also, pointers to different types may have different representations - e.g., a char * may not look like an int *, which may not look like a double *, etc. On desktop systems like x86 they do, but it’s not guaranteed.

Edit

From a comment:

when initializing an int vector like this: for( int i=0; i < size; ++i); scanf("%d", &vector[i]) does the calculator uses this pointer "mechanism" to cycle trough?

Yes, exactly. scanf expects the argument corresponding to the %d conversion specifier to be the address of an int object, meaning an expression of type int *. The unary & operator returns the address of an object, so assuming vector has been declared

int vector[N]; // for some value of N

then the expression &vector[i] evaluates to the address of the i’th element of the array, and the type of the expression is int *.

Remember that C passes all function arguments by value - the formal parameter in the function definition is a different object in memory than the actual parameter in the function call. For example, given

void foo( T x ) // for any type T
{ 
  x = new_value;
}

void bar( void )
{
  T var;
  foo( var );
}

the formal parameter x in foo is a different object in memory than var, so the change to x doesn't affect var. If we want foo to able to write to var, then we must pass a pointer to it:

void foo( T *ptr )
{
  *ptr = new_value; // write a new value to the thing ptr *points to*
}

void bar( void )
{
  T var;
  foo( &var ); writes a new value to var
}

The unary * operator in *ptr = new_value dereferences ptr, so the expression *ptr in foo is equivalent to var:

*ptr ==  var  // T   == T
 ptr == &var  // T * == T *

In a declaration, the * simply means that the object ptr has pointer type - it doesn’t dereference, so you can write something like

int x;
int *ptr = &x; // ptr is *not* being dereferenced
int y = 5;
*ptr = y;      // ptr *is* being dereferenced

Thank you for the answer, i shouldn't probably ask "subquestions" in the comments but when initializing an int vector like this: ```for( int i=0; i < size; ++i); scanf("%d", &vector[i])``` does the calculator uses this pointer "mechanism" to cycle trough? — Giuseppe Reitano, Apr 18 '20 at 16:30

Is it legal to treat a pointer like an array?

4 Answers4