-1

New here, please let me know if I'm not asking my question appropriately.

I'm trying to create a sequence of bytes dynamically from file data. Here's the bit of code I'm having problems with (I realize prints are a lousy way to debug, but the problem seems trivial and I'm not seeing much else with gdb). Edited to include an example that compiles. Actual code is much longer, didn't want to burden anyone with it.

#include <stdio.h>
#include <stdlib.h>

int main() {
  FILE *fpt;
  unsigned char *file;
  int file_size, i;

  // File ops
  fpt = fopen("test.txt", "rb");
  fseek(fpt, 0L, SEEK_END);
  file_size = ftell(fpt);   
  fseek(fpt, 0L, SEEK_SET);
  file = (unsigned char *) calloc(sizeof(unsigned char), file_size);
  fread(file, file_size, sizeof(unsigned char), fpt);   
  fclose(fpt);

  int str_len = 1;
  unsigned char *string = (unsigned char *)calloc(sizeof(unsigned char), str_len);
  unsigned char *next_byte = (unsigned char *)calloc(sizeof(unsigned char),1);
  i = 1;
  *string = *file;  

  // Build byte sequence
  while (i < file_size) {
    *next_byte = *(file + i);
    str_len++;
    string = realloc(string, str_len);
    if (i==1)printf("string+char = %04X\n",  (*string << 8) | *next_byte);
    *string = (*string << 8) | *next_byte;
    if (i==1)printf("string+char = %04X\n", *string);
    i++;
  }
  return 0;
}
// I realize I need to free memory

Say my test.txt simply contains

ABC

Looking with a hex editor I would see

41 42 43 0A

The output for the two prints is:

string+char = 4142 string+char = 0042

My question is why does the first print give the result I want while the second doesn't? I realize an unsigned char is a single byte so shifting to the left 8 bits should give me zero's. I don't understand why printing the bit operations directly gives me the result I would like.

Is there a better way to concatenate a single byte to a sequence of bytes?

Bill Lynch
  • 80,138
  • 16
  • 128
  • 173
  • Please post a [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve). – R Sahu Feb 12 '16 at 04:21
  • With `unsigned char *string = calloc(sizeof(unsigned char),str_len); ... *string = *file;` What was the reason for the first `calloc()`. `string` value is overwritten so soon? – chux - Reinstate Monica Feb 12 '16 at 04:37
  • "shifting an unsigned char left by 8 bits would give me zero's" ---> no. before the shift, the usual promotions occur (e.g. to `int`), `(*string << 8) | *next_byte` could reasonably result in a print of 4 non-zero hex digits. – chux - Reinstate Monica Feb 12 '16 at 04:42
  • Aside: consider a simpler style. Rather than `file = (unsigned char *) calloc(sizeof(unsigned char), file_size);` use the easier to code, read and maintain `file = calloc(file_size, sizeof *file);`. No cast, no types, just vairables. – chux - Reinstate Monica Feb 12 '16 at 04:46
  • Thanks for the reply, good to know why the first print works. I'm trying to only allocate memory to `string` as I need it. Thanks for the tip on sizeof *file, didn't know I could get type sizes from my variables. Any suggestions on how to properly store my concatenated bytes in my `string` variable? – stratocaster_master Feb 12 '16 at 04:51
  • [don't cast the result of `malloc` in C](http://stackoverflow.com/q/605845/995714) – phuclv Feb 12 '16 at 06:28

1 Answers1

0

This was a bit of a puzzle for me.

I'm having trouble seeing what the code example is doing differently than you expected.
Let me illustrate...

I added some prints to the sample code, see concat.c below for the full program but let's focus on the while-loop's shifting. fyi, everything up to that point (loading the 4 bytes from "test.txt" etc.) seems to be working as expected.

Here is the specific code fragment:

shift fragment, original

if (i==1)printf("string+char = %04X\n",  (*string << 8) | *next_byte);
*string = (*string << 8) | *next_byte;
if (i==1)printf("string+char = %04X\n", *string);

I think your printf() with the %04x is suggesting there is more in *string than is actually there.

shift fragment, revised

I revised the shift expression to break down what I think it is doing using a series of separate variables so I could look at the values of each one.

       /* original:  *string = (*string << 8) | *next_byte; */
       printf("   explicit shifting...\n");
       unsigned char a = *string;
       printf("      Lets print 'a' as %%x and %%04x just to verify zero padding doesn't change anything.\n");
       printf("      a=%x  a=%04x  sizeof(a)=%ld\n", a, a, sizeof(a) );
       unsigned char b = a << 8;
       printf("      b=%x       : b = a << 8 yielded all zero bits.\n", b);
       unsigned char c = *next_byte;
       printf("      c=%x       : c = *next_byte as expected.\n", c);
       unsigned char d = b | c;
       printf("      d=%x       : d bitwise-or of 0 w/c yields, well, c.\n", d);
       // pick up with original...
       // note that we're always overwriting string[0]. 
       // the following is the same as saying:
       //     string[0] = d; // assuming 'd' is assigned as above.
       *string = (*string << 8) | *next_byte;
       // If you want to use shift and accumulate 4 bytes
       // you probably want *string to be a long.
       printf("   after shift: string[0]=%x, next_byte[0]=%x\n", string[0], next_byte[0] );
       if (i==1) printf("string+char = %04X\n", *string);

output from shift fragment, revised

The following is the first pass through the while loop.
i = 1, and next_byte = 0x42.
(see below for full output)

   explicit shifting...
      Lets print 'a' as %x and %04x just to verify zero padding doesn't change anything.
      a=41  a=0041  sizeof(a)=1
      b=0       : b = a << 8 yielded all zero bits.
      c=42       : c = *next_byte as expected.
      d=42       : d bitwise-or of 0 & c yields, well, c.
   after shift: string[0]=42, next_byte[0]=42
string+char = 0042

That last line from your original program is displaying string+char = 0042 but isn't that because the printf format is saying printf("string+char = %04X\n", *string);? I mean... the underlying value for *string is just 0x42, which is what it is supposed to be doing, yes?

Now having said that, I don't think this is about "shifting" so much as it is continually overwiting the first byte of string, e.g. string[0].

maybe this is the actual problem?

Here's another output fragment, this is from some prints I added after the while loop (again, see concat.c) below for the the full code.

After while loop.  just for fun, let's print out string...
   string[0]=0a  (string+0)=0x114d010
   string[1]=00  (string+1)=0x114d011
   string[2]=00  (string+2)=0x114d012
   string[3]=00  (string+3)=0x114d013
done.
$ 

Here you can see the value of string is pretty empty.
string did actually hold onto the last character, the 0x0a.
And for reasons that are unclear to me, string is initialized with the 1st character, 'A', before the while loop starts. At any rate...
During the while loop, string has been reallocated several times. (e.g. string = realloc(string, str_len); - why do this, btw?)

So... did you want the shift & bitwise-or to cascade characters "ABC\n" across string?

If you had something like long my4bytes; instead of string I could see shifting & bitwise-or being "helpful" in gathering up the individual bytes.
But... the string is essentially an array of bytes (ok, array of unsigned char).

Assuming you want to keep string as an array of bytes, I think you'd be better off just doing something like: string[i] = *next_byte;


So. There are lots of ways you could write that; the while loop is kind of cluttered so I will leave that to you. (which is understandable... I can imagine the clutter is because of the original example you cut down to something small enough to post in the original question).


At any rate, does this help point you in a useful direction?
Or... did I miss the intent of the original question?

sample output from concat.c

$ cat test.txt 
ABC
$ hexdump -C test.txt
00000000  41 42 43 0a                                       |ABC.|
00000004
$ gcc concat.c 
$ ./a.out
file_size=4
   file[0]=41  (file+0)=0x114d240
   file[1]=42  (file+1)=0x114d241
   file[2]=43  (file+2)=0x114d242
   file[3]=0a  (file+3)=0x114d243

--- Build byte sequence
while: i=1
   file[i]=42  file=0x114d240
   next_byte[0]=42  next_byte=0x114d030
string+char = 4142
   before shift: string[0]=0041, next_byte[0]=0042
   explicit shifting...
      Lets print 'a' as %x and %04x just to verify zero padding doesn't change anything.
      a=41  a=0041  sizeof(a)=1
      b=0       : b = a << 8 yielded all zero bits.
      c=42       : c = *next_byte as expected.
      d=42       : d bitwise-or of 0 w/c yields, well, c.
   after shift: string[0]=42, next_byte[0]=42
string+char = 0042
while: i=2
   file[i]=43  file=0x114d240
   next_byte[0]=43  next_byte=0x114d030
   before shift: string[0]=0042, next_byte[0]=0043
   explicit shifting...
      Lets print 'a' as %x and %04x just to verify zero padding doesn't change anything.
      a=42  a=0042  sizeof(a)=1
      b=0       : b = a << 8 yielded all zero bits.
      c=43       : c = *next_byte as expected.
      d=43       : d bitwise-or of 0 w/c yields, well, c.
   after shift: string[0]=43, next_byte[0]=43
while: i=3
   file[i]=0a  file=0x114d240
   next_byte[0]=0a  next_byte=0x114d030
   before shift: string[0]=0043, next_byte[0]=000a
   explicit shifting...
      Lets print 'a' as %x and %04x just to verify zero padding doesn't change anything.
      a=43  a=0043  sizeof(a)=1
      b=0       : b = a << 8 yielded all zero bits.
      c=a       : c = *next_byte as expected.
      d=a       : d bitwise-or of 0 w/c yields, well, c.
   after shift: string[0]=a, next_byte[0]=a

After while loop.  just for fun, let's print out string...
   string[0]=0a  (string+0)=0x114d010
   string[1]=00  (string+1)=0x114d011
   string[2]=00  (string+2)=0x114d012
   string[3]=00  (string+3)=0x114d013
done.
$ 

concat.c

#include <stdio.h>
#include <stdlib.h>

int main() {
   FILE *fpt;
   unsigned char *file;
   int file_size, i;

   // File ops
   fpt = fopen("test.txt", "rb");
   fseek(fpt, 0L, SEEK_END);
   file_size = ftell(fpt); 
   printf("file_size=%d\n", file_size);
   fseek(fpt, 0L, SEEK_SET);
   //file = (unsigned char *) calloc(sizeof(unsigned char), file_size);
   file = calloc( file_size, file_size);
   fread(file, file_size, sizeof(unsigned char), fpt); 
   fclose(fpt);

   int str_len = 1;
   unsigned char *string = 
      (unsigned char *)calloc(sizeof(unsigned char), str_len);
   unsigned char *next_byte = 
      (unsigned char *)calloc(sizeof(unsigned char),1);
   i = 1;
   //printf("          file[0]=%02x  file=%p\n", file[0], file );
   //printf("before: string[0]=%02x  string=%p\n", string[0], string );
   *string = *file;  
   //printf("after:  string[0]=%02x  string=%p\n", string[0], string );

   for( int idx = 0; idx < file_size; ++idx ) {
      printf("   file[%d]=%02x  (file+%d)=%p\n", idx, file[idx], idx, (file+idx) );
   }
   printf("\n--- Build byte sequence\n");
   while (i < file_size) {
       printf("while: i=%d\n", i );
       printf("   file[i]=%02x  file=%p\n", file[i], file );
       *next_byte = *(file + i);
       printf("   next_byte[0]=%02x  next_byte=%p\n", next_byte[0], next_byte );
       str_len++;
       string = realloc(string, str_len);
       if (i==1) printf("string+char = %04X\n",  (*string << 8) | *next_byte);
       printf("   before shift: string[0]=%04x, next_byte[0]=%04x\n", string[0], next_byte[0] );
       /* original:  *string = (*string << 8) | *next_byte; */
       printf("   explicit shifting...\n");
       unsigned char a = *string;
       printf("      Lets print 'a' as %%x and %%04x just to verify zero padding doesn't change anything.\n");
       printf("      a=%x  a=%04x  sizeof(a)=%ld\n", a, a, sizeof(a) );
       unsigned char b = a << 8;
       printf("      b=%x       : b = a << 8 yielded all zero bits.\n", b);
       unsigned char c = *next_byte;
       printf("      c=%x       : c = *next_byte as expected.\n", c);
       unsigned char d = b | c;
       printf("      d=%x       : d bitwise-or of 0 w/c yields, well, c.\n", d);
       // pick up with original...
       // note that we're always overwiting string[0]. 
       // the following is the same as saying:
       //     string[0] = d; // ssuming 'd' is assigned as above.
       *string = (*string << 8) | *next_byte;
       // If you want to use shift and accumulate 4 bytes
       // you probably want *string to be a long.
       printf("   after shift: string[0]=%x, next_byte[0]=%x\n", string[0], next_byte[0] );
       if (i==1) printf("string+char = %04X\n", *string);
       i++;
   }
   printf("\nAfter while loop.  just for fun, let's print out string...\n");
   for(int idx = 0; idx < str_len; ++idx ) {
      printf("   string[%d]=%02x  (string+%d)=%p\n", idx, string[idx], idx, (string+idx) );
   }
   printf("done.\n");
   return 0;
}
// I realize I need to free memory

p.s. prints can actually be a great way to troubleshoot; don't hesitate to keep adding them until what your code is doing becomes transparently obvious.

jgreve
  • 1,225
  • 12
  • 17
  • Thanks for the in depth reply, certainly helped point me in the right direction. Maybe it will make sense if I share that the goal for this code was to create sequences for a dictionary I was using in a codec I am writing (sequences are variable length). I took your advice and instead am using `string[i] = *next_byte`. Am getting the results I would like now. Thanks again! – stratocaster_master Feb 12 '16 at 12:59
  • To clarify, I am now taking the following approach `string[str_len] = next_byte`, string is now just an array defined to some `MAX_LEN` and next_byte is simply an unsigned char. My index str_len can grow to size `MAX_LEN` and I just begin building a new sequence if I hit this limit. Thanks again, in retrospect I'm wondering why I thought my initial approach was a good idea (a bit sleep deprived).... – stratocaster_master Feb 12 '16 at 13:55
  • It sounds like a solid approach. r.e. "wondering why I thought my initial approach was a good idea" - You have brightened my morning :-) I've had plenty of designs that started out like that. No doubt I'll have many more. Happy coding. – jgreve Feb 12 '16 at 15:29