0

Our programming assignment asked us to break a text file into a set of smaller files having names (filename)partx.txt. For example if the argument passed to the program is a text file named stack.txt then the output should be stackpart1.txt, stackpart2.txt etc, where each file is of size 250 bytes max.

What is the best way to attain the part_x thing ?

I learned about using macro with ## to attain that. What are the drawbacks of this method and is there any better way ? Is it a good practise to generate variable names this way ?

Crazy-Fool
  • 96
  • 1
  • 13
  • I have the feeling that you want to name files, not variables. The file name would typically be stored in a variable with an arbitrary name. If the number of resulting text files is not fixed you'll probably use an array which is big enough to hold the largest number of file names which can reasonably be expected (and deal with the case that it is exceeded). With an array you wouldn't have variable names at all any longer. – Peter - Reinstate Monica Apr 16 '15 at 10:14
  • Yeah, actually I have to create files having names filepart1.txt, filepart2.txt and so on. Not the name of the variable. Which can be done using snprintf() and then using the string thus created in fopen. If I am not wrong. – Crazy-Fool Apr 16 '15 at 10:15
  • This is what I said (or try to say) which is why I do not understand why your sentence starts with "No" ;-) – Peter - Reinstate Monica Apr 16 '15 at 10:16
  • Oops sorry. I just edited it. – Crazy-Fool Apr 16 '15 at 10:16
  • What you say is correct (snprintf is the way to go). – Peter - Reinstate Monica Apr 16 '15 at 10:18
  • 1
    Also remember that preprocessor directives are just conveniences. Everything which can be done by the preprocessor could as well be simply written down; it would just be less maintainable. So if you had all information at compile time you could just name your variables and be done with it, no preprocessor needed. I take it that is not the case ;-). **It is impossible to use run time text (user input, file contents) in a preprocessor command.** – Peter - Reinstate Monica Apr 16 '15 at 10:22
  • Now there is a downvote. Waiting to be blocked from asking questions. :( – Crazy-Fool Apr 16 '15 at 10:22

3 Answers3

3

Don't confuse variable names with their content; macros and variable names have nothing to do with your assignment. ## is used to join strings to be used in your code at compile-time (a typical usage is to build identifiers or in general code parametrically in macros), which is a relatively rare and very specialized task.

What you want to do, instead, is to generate strings at runtime based on a pattern (=> you'll have the same string variable that you'll fill with different stuff at each iteration); the right function for this is snprintf.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • 2
    I agree mostly except that the concatenation operator is not strictly restricted to generating identifier names (btw, isn't that a pleionasm?). It just concatenates arbitrary text and I bet you there is a use case like stringifying or using operators as arguments which results in something different. – Peter - Reinstate Monica Apr 16 '15 at 10:26
  • the usage of `##` isn't restricted to concatenating identifiers (exclusively), I sometimes use it in a unified macro, that in turn expands to any number of similar macro's (ie `#define DECLARE(name, value, type) DECLARE_ ##type(#name, #value)` expanding to any of the `DECLARE_X` macro's – Elias Van Ootegem Apr 16 '15 at 10:39
  • Of course you are both right, I tried to clarify a bit. – Matteo Italia Apr 16 '15 at 11:17
2

It's perfectly simple, I'd say: You open a file (fopen returns a FILE *) which you can then read in a loop, using fread to specify the max amount of bytes to read on each iteration. Given the fact you're using a loop anyway, you can increment a simple int to keep track of the chunk-file names, using snprintf to create the name, write the characters read by fread to each file, and continue until you're done.

Some details on fread that might be useful to you

A basic example (needs some work, still):

int main( void )
{
   int chunk_count = 0, chunk_size = 256;
   char buffer[256]
   FILE *src_fp,
        *target_fp;
   char chunk_name[50];

   while (chunk_size == fread(buffer, chunk_size, 1, src_fp))
   {//read chunk
       ++chunk_count;//increase chunk count
       snprintf(chunk_name, 50, "chunk_part%d.txt", chunk_count);
       target_fp = fopen(chunk_name, "w");
       //write to chunk file
       fwrite(buffer, chunk_size, 1, target_fp);
       fclose(target_fp);//close chunk file
   }
   //don't forget to write the last chunk, if it's not 0 in length
   if (chunk_size)
   {
       ++chunk_count;//increase chunk count
       snprintf(chunk_name, 50, "chunk_part%d.txt", chunk_count);
       target_fp = fopen(chunk_name, "w");
       //write to chunk file
       fwrite(buffer, strlen(buffer) + 1, 1, target_fp);
       fclose(target_fp);//close chunk file
   }
   fclose(src_fp);
   printf("Written %d files, each of max 256 bytes\n", chunk_count);
   return 0 ;
}

Note that this code is not exactly safe to use as it stands. You'll need to check the return values of fopen (it can, and at some point will, return NULL). The fread-based loop simply assumes that, if its return value is less than the chunk size, we've reached the end of the source-file, which isn't always the case. you'll have to handle NULL pointers and ferror stuff yourself, still. Either way, the functions to look into are:

  • fread
  • fopen
  • fwrite
  • fclose
  • ferror
  • snprintf

That should do it.


Update, just for the fun of it.

You might want to pad the numbers of your chunk file names (chunk_part0001.txt). To do this, you can try to predict how big the source file is, divide that by 256 to work out how many chunks you're actually going to end up with and use that amount of padding zeroes. How to get the file size is explained here, but here's some code I some time ago:

long file_size = 0,
     factor = 10;
int padding_cnt = 1;//at least 1, ensures correct padding
fseek(src_fp, 0, SEEK_END);//go to end of file
file_size = ftell(src_fp);
file_size /= 256;//divided by chunk size
rewind(src_fp);//return to beginning of file
while(10 <= (file_size/factor))
{
    factor *= 10;
    ++padding_cnt;
}
//padded chunk file names:
snprintf(chunk_name, sizeof chunk_name, "chunk_part%0*d.txt", padding_cnt, chunk_count);

If you want, I could explain every single statement, but the gist of it is this:

  • fseek + ftell gets to size of the file (in bytes), divided by the chunk size (256) gets you the total number of chunks you'll create (+1 for the remainder, which is why padding_cnt is initialized to 1)
  • The while loop divides the total count by 10^n, each time the factor is multiplied by 10, the padding count increases
  • the format passed to snprintf changed to %0*d which means: _"print an int, padded by n occurrences of 0 (ie to a fixed width). If you end up with 123 chunks, the first chunk file will be called chunk_part001.txt, the tenth file will be chunk_part010.txt all the way up to chunk_part100.txt.
  • refer to the linked question, the accepted answer uses sys/stat.h to get the file-size, which is more reliable (though it can pose some minor portability issues) Check the stat wiki for alternatives

Why? Because it's fun, and it makes the output files easier to sort by name. It also enables you to predict how big the char array that holds the target file name should be, so if you have to allocate that memory using malloc, you know exactly how much memory you'll need, and don't have to allocate 100 chars (which should be enough either way), and hope that you don't run out of space.
Lastly: the more you know, the better IMO, so I thought I'd give you some links and refs you might want to check.

Community
  • 1
  • 1
Elias Van Ootegem
  • 74,482
  • 9
  • 111
  • 149
  • fread() was new to me. This can help me to write a better solution. – Crazy-Fool Apr 16 '15 at 10:26
  • 1
    @Crazy-Fool: Added a _basic_ example of how you could go about chunking a file. It's far from perfect (doesn't check for errors, like `fopen` returning `NULL` and all that, nor does it check `ferror`), but this should be enough to get you started – Elias Van Ootegem Apr 16 '15 at 10:32
1

You can either:

  • Use a MACRO as suggested (Compile-time). This involves some amount of knowledge to be present regarding the filesize (and numbers for sub-files) while implementing the code.
  • use snprintf() in a loop to generate the filename.(Runtime). This can be used dynamically based on some algorithm for measuring the file size.

That said, best way : use snprintf().

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261