3

First off I am creating a program that will read lines of characters and find words (they don't have to have meaning, i.e 'ab' could be word ) and storing them in the appropriate data structure. I used trie structure to store the words. I am given a mapping file as a command line argument yet inside the mapping file I have two data files I need to gain information from. The usage interface is as follows: first(program name) <mappingfile>.

Inside the mapping file, there exists two data files: <dictFile> and <dataFile>. Im not sure how to read and store the information presented the two data files. So far I have the following:

#include <stdio.h>
#include<stdlib.h>

void readDict(FILE *dict_file){

}

int main(int argc, char *argv[]){
  FILE* file;


  if(argc != 2){ //error in inputing, not 2 files
    printf("error\n");
    return 0;
  }

  file = fopen(argv[1],"r" ); //reading the mapping file

  input;
  if(file == NULL){ //nothing inside file
    printf("file does not exist\n");
    return 0;
  }
}

My goal is to have pointers point to respective data files in the mapping file which I can use for reading their contents. I will be given the following input in the command line: first(program name) <mappingfile>.

Inisde the mapping file contains the lines of two plain .txt files in the form <dictFile> <dataFile>.

I wish to access both contents of <dictFile> and <dataFile>.. with pointers to the respective file.

Dhollasc
  • 59
  • 4
  • Does the mapping file point to two other files or are the files actually embedded into a single file? – Linus Oct 20 '15 at 16:35
  • they are embedded into the single mapping file. In the command line you are give the mapping file. inside the mapping file you have . Both files inside the mapping files are plain text files; – Dhollasc Oct 20 '15 at 16:39
  • As example: you can having a as the following, respectiviely: boo22$Book5555bOoKiNg#bOo#TeX123tEXT(JOHN) John1TEXAN4isa1BOoRiSH%whohasa2bo3KING BOOKING bOoKIngs$12for a TEX-Text(BOOKS(textBOOKS) – Dhollasc Oct 20 '15 at 16:41
  • What operating system are you using? In windows [file mapping](https://msdn.microsoft.com/en-us/library/windows/desktop/aa366556%28v=vs.85%29.aspx) would be a good way to start. And if you're doing this on linux read about [mmap(2)](http://man7.org/linux/man-pages/man2/mmap.2.html), on linux I could show you an example. – Linus Oct 20 '15 at 16:49
  • Im using windows but doing all the coding through emacs(required to do so). All I'm given is the input file(mapping file). in the mapping file are lines . I need to access both those files stored in the input file. You can assume the mapping file will be structured the same every time, i.e it will always contain: in that order – Dhollasc Oct 20 '15 at 16:57
  • So you mean on every line the dictionary entry and data entry is put side by side? It doesn't actually contain virtual addresses for two separate files? What I mean is that you have the and on two columns. – Linus Oct 20 '15 at 17:01
  • They are indeed files; Each dictFile and dataFile inside the mapping files are .txt files; you will have 2 columns and x amount of rows with each row following the format: ; An example is: dict_1 data_1 The dict_1 is a .txt file and data_1 is text file. Both the dictFile and datafile are found in the mapping file – Dhollasc Oct 20 '15 at 17:08
  • 1
    Oh, so what you mean is that they're actually filenames? – Linus Oct 20 '15 at 17:09
  • Correct and i need to acces both those files which are inside the mapping file(given in the command line). – Dhollasc Oct 20 '15 at 17:11
  • 1
    Right, I'll give an answer later when I'm available if no one else already gave you a good answer. – Linus Oct 20 '15 at 17:14
  • Strongly suggest, when compiling, always enable all the warnings, then fix those warnings. amongst other problems with the code, the following (easily fixed) warnings are raised: 1) `input` not declared, 2) first parameter to main() should be 'int', not 'char' 3) unused parameter: `dict_file` – user3629249 Oct 20 '15 at 17:24
  • when the correct number of command line parameters has not been entered, the program outputs 'error' . This tells the user next to nothing. What is needed is a usage statement, similar to: `printf( "USAGE: %s, mapFileName\n", argv[0] ); – user3629249 Oct 20 '15 at 17:28
  • There are error checks for not have 2 arguements,(if arc!=2) and if the file is null(file==null); – Dhollasc Oct 20 '15 at 17:35
  • regarding this line: ` if(file == NULL){ //nothing inside file` the comment is incorrect. what is actually being determined is if the call to fopen() was successful (or not) – user3629249 Oct 20 '15 at 17:38
  • Are you expecting us to supply the rest of the needed code for you? If so, I charge $150/hr. with a non-refundable deposit of $1500. If you post your efforts to accomplish the assignment and the expected and actual output, then we can help you (for free). – user3629249 Oct 20 '15 at 17:47
  • the error checks for number of parameters and if fopen() was successful are fine. The handling of the errors is what I'm referring to as having problems. BTW: a return of 0 from main() is an indication of success. A more useful return value would be `EXIT_FAILURE` as defined in stdlib.h – user3629249 Oct 20 '15 at 17:53
  • I know how to access data in a single file given in the command line, but im not sure how to access the two .txt files given the mapping file(in the command line) – Dhollasc Oct 20 '15 at 17:58
  • please edit your question to show an example of a line in the mapping file. There seems to be some confusion. does the mapping file contain pairs of file names or the actual contents of the dictionary and data files? – user3629249 Oct 20 '15 at 18:02
  • Edited and the mapping file contains two .txt files – Dhollasc Oct 20 '15 at 18:18

2 Answers2

0

If I understand you correctly this should do it. Note that it assumes your filenames don't have any spaces. And if you want to use the "non secure" api's you need to add _CRT_SECURE_NO_WARNINGS to the project properties under Configuration Properties -> C/C++ -> Preprocessor -> Preprocessor Definitions.

#include <stdio.h>
#include<stdlib.h>

void readDict(FILE *dict_file){

}

int main(int argc, char *argv[]){
  FILE* file;


  if(argc != 2){ //error in inputing, not 2 files
    printf("error\n");
    return 1;
  }

  file = fopen(argv[1],"r" ); //reading the mapping file

  //input;
  if(file == NULL){ //nothing inside file
    printf("file does not exist\n");
    return 1;
  }

  char dictFileString[256], dataFileString[256];
  fscanf( file, "%255s %255s", dictFileString, dataFileString );

  FILE *dictFile, *dataFile;
  dictFile = fopen( dictFileString, "r" );
  if (dictFile == NULL) {
      printf( "%s does not exist\n", dictFileString );
      fclose(file);
      return 1;
  }
  dataFile = fopen( dataFileString, "r" );
  if (dataFile == NULL) {
      printf( "%s does not exist\n", dataFileString );
      fclose(file);
      fclose(dictFile);
      return 1;
  }

  readDict(dictFile);

  //  The additional logic would be placed here.

  fclose( dictFile );
  fclose( dataFile );

  //  If you need to read additional file names then loop
  //  back up to read the next line of 'file'

  fclose( file );
  return 0;
}
eddyq
  • 879
  • 1
  • 13
  • 25
  • This will only read the first line and you never close your files! Please don't use `fscanf`. Reading lines is done using the `fgets` function. – Linus Oct 20 '15 at 19:47
  • My answer was intended to show the basic procedure. It is clear it will only read the first line. It is up to the programmer, who knows what he needs, to finish the code. – eddyq Oct 20 '15 at 19:51
  • Linus, this type of criticism does not help anyone. It discourages help from others. This is why stackoverflow.com is getting so many bad remarks on the internet. Note that I answered the question as asked and I think it will really help the Daniel move to the next level. – eddyq Oct 20 '15 at 19:55
  • OP clearly says in the comments "you will have 2 columns and x amount of rows with each row following the format: ", besides your code is not safe, it suffers from overflow bugs with `fscanf` and leaks resources. I appreciate that you contribute to SO, however I think this kind of criticism will improve the quality of your answers. – Linus Oct 20 '15 at 19:59
  • Seems to make sense the only i have trouble understanding is char dictFileString[265], dataFileString[256]; When call the fscanf is it saving the txt of the files into both arrays? – Dhollasc Oct 20 '15 at 20:03
  • The 265 and 256 were supposed to be 256 and 256. The point was to illustrate how to read the dictFile and dataFile and the reason for the large arrays was to relieve the worry about an assumption of a short file name (similar to your MAX_LENGTH). – eddyq Oct 20 '15 at 20:09
  • Linus, where is the overflow bugs you are speaking of? BTW, before I posted I refreshed my screen to be sure you had not yet posted. I did't see anything so I posted. You posted very close to the same time and it appears as though my post was to contradict yours ... but it was a race condition. I would appreciate it if you remove your down vote. – eddyq Oct 20 '15 at 20:10
  • Linus, where is this statement "you will have 2 columns and x amount of rows with each row following the format: "? – eddyq Oct 20 '15 at 20:18
  • @user3389362 I didn't mean anything bad with my criticim, no need to get upset about it. The main reason I downvoted was because your code is unsafe because of two reasons. What happens if in the mapping file, the length of either of the two filenames exceeds 256 characters? It will overflow and possibly crash your application. Also you never use `fclose` which is strongly discouraged. If you fix these two issues I'll gladly remove my downvote. – Linus Oct 20 '15 at 20:34
  • Linus is correct in saying there will be x rows of dictfile data file; the dict file is a .txt file that holds multiple lines of text as well does the data file; inside the mapping file, there will be two columns( ) and x amount of rows – Dhollasc Oct 20 '15 at 20:35
  • Dhollasc, I searched for the words "two columns" and I just don't see it. Can you please be more specific? – eddyq Oct 20 '15 at 20:46
  • @user3389362 Great, just FYI a simple fix (a lazy one) for the overflow problem would be to change `fscanf( file, "%s %s", dictFileString, dataFileString );` to `fscanf( file, "%256s %256s", dictFileString, dataFileString );`. Please see [this](http://stackoverflow.com/questions/1621394/how-to-prevent-scanf-causing-a-buffer-overflow-in-c/1621973#1621973) for a more proper work around. – Linus Oct 20 '15 at 20:56
  • @user3389362 it is stated under the questions comments at the top – Dhollasc Oct 20 '15 at 21:02
  • Dhollasc, maybe something isn't displaying correctly in my browser. Are you taking about where you said this? As example: you can having a as the following, respectiviely: boo22$Book5555bOoKiNg#bOo#TeX123tEXT(JOHN) John1TEXAN4isa1BOoRiSH%whohasa2bo3KING BOOKING bOoKIngs$12for a TEX-Text(BOOKS(textBOOKS) – eddyq Oct 20 '15 at 21:09
0

If I understand your question correctly you want to parse a file where each line contains the filenames of two other files and then read from these. What you can do is use fgets to read your mapping file line by line. What you can do next is use the function strtok to split your string on a whitespace. I'll break it down for you step by step.

Firstly we want to open the mapping file for reading

if((file = fopen(argv[1],"r")) == NULL) {
  perror("error opening file");
  return 1;
}

This will try to open the mapping file specified by the command line arguments of your program and if it fails it will print a corresponding error message.

while(fgets(buf, sizeof(buf), file) != NULL) {

After we've opened the file we want to iterate through all the lines until we reach the end of the file and fgets will return NULL. fgets will put the current line into buf.

dictfilename = strtok(buf, " ");
datafilename = strtok(NULL, " ");
strtok(dictfilename, "\n"); /* Remove any trailing newlines */
strtok(datafilename, "\n");

We need to split the line read by fgets by a delimter (a whitespace) so we know which part corresponds to the dictfile and the datafile. This is done by using the strtok function which returns a pointer to the substring before the whitespace and when passing in NULL it will return a pointer to the substring after the whitespace. A slightly weird way of removing any trailing newlines is to use strtok and the newline as a delimiter.

if((dictfile = fopen(dictfilename,"r")) == NULL) {
  fprintf(stderr, "error opening file %s: %s\n", dictfilename, strerror(errno));
  return 1;
}

if((datafile = fopen(datafilename,"r")) == NULL) {
  fprintf(stderr, "error opening file %s: %s\n", datafilename, strerror(errno));
  return 1;
}

Very similiarly to how we open the mapping file, we now open the two files found on the current line read by fgets with "r" mode which opens for reading. If the file does not exist or cannot be found, the fopen call fails.

printf("Content of %s:\n", dictfilename);
while ((c = getc(dictfile)) != EOF)
  putchar(c);

printf("\nContent of %s:\n", datafilename);
while ((c = getc(datafile)) != EOF)
  putchar(c);

This is a very simple method of "dumping" the content of the files. It uses getc to read the next char from the file and prints it until it reads EOF. This is where you should do your own function.

fclose(dictfile);
fclose(datafile);

And don't forget to close the files afterwards or you will leak resources.

Finally here is the code on what I just described

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>

#define MAX_LENGTH 100 // change this to the actual maximum length of your lines.

int main(int argc, char **argv){
  FILE* file, *dictfile, *datafile;
  char c;
  char buf[MAX_LENGTH];
  char *dictfilename, *datafilename;

  if(argc != 2) {
    fprintf(stderr, "Usage: %s <mapping file>\n", argv[0]);
    return 0;
  }

  if((file = fopen(argv[1],"r")) == NULL) {
    perror("error opening file");
    return 1;
  }

  while(fgets(buf, sizeof(buf), file) != NULL) {
    dictfilename = strtok(buf, " ");
    datafilename = strtok(NULL, " ");
    strtok(dictfilename, "\n"); /* Remove any trailing newlines */
    strtok(datafilename, "\n");

    if((dictfile = fopen(dictfilename,"r")) == NULL) {
      fprintf(stderr, "error opening file %s: %s\n", dictfilename, strerror(errno));
      return 1;
    }

    if((datafile = fopen(datafilename,"r")) == NULL) {
      fprintf(stderr, "error opening file %s: %s\n", datafilename, strerror(errno));
      return 1;
    }

    // do something with the files (e.g read all the content)
    printf("Content of %s:\n", dictfilename);
    while ((c = getc(dictfile)) != EOF)
      putchar(c);

    printf("\nContent of %s:\n", datafilename);
    while ((c = getc(datafile)) != EOF)
      putchar(c);
    printf("\n");

    // don't forget to close the files when you're done with them.
    fclose(dictfile);
    fclose(datafile);
  }
  fclose(file);
}
Linus
  • 1,516
  • 17
  • 35
  • Having a problem with the strtok for the second part datafilename; i am getting null pointer with my *datafile – Dhollasc Oct 20 '15 at 21:32
  • @Dhollasc Did you use `strtok(datafilename, "\n");` to remove any newlines after the filename? It should say `error opening file` and then some useful information about what failed. What was the error message? – Linus Oct 20 '15 at 21:35