0

The task is to read in a .txt file with a command line argument, within the file there is a list unstructured information listing every airport in the state of Florida note this is only a snippet of the total file. There is some data that must be ignored such as ASO ORL PR A 0 18400 - anything that does not pertain to the structured variables within AirPdata.

The assignment is asking for the site number, locID, fieldname, city, state, latitude, longitude, and if there is a control tower or not.

INPUT

03406.20*H 2FD7 AIR ORLANDO ORLANDO FL ASO ORL PR 28-26-08.0210N 081-28-23.2590W PR NON-NPIAS N A 0 18400

03406.18*H 32FL MEYER- INC ORLANDO FL ASO ORL PR 28-30-05.0120N 081-22-06.2490W PR NON-NPAS N 0 0

OUTPUT

   Site# LocID Airport Name City ST Latitude Longitude Control Tower        
------------------------------------------------------------------------     
03406.20*H 2FD7 AIR ORLANDO ORLANDO FL 28-26-08.0210N 081-28-23.2590W N
03406.18*H 32FL MEYER       ORLANDO FL 28-30.05.0120N 081-26-39.2560W N
etc..      etc. etc..       etc..   .. etc..          etc..           ..
etc..      etc. etc..       etc..   .. etc..          etc..           ..

my code so far looks like

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>

typedef struct airPdata{
char *siteNumber;
char *locID;
char *fieldName;
char *city;
char *state;
char *latitude;
char *longitude;
char controlTower;
} airPdata;

int main (int argc, char* argv[])
{

char text[1000];
FILE *fp;
char firstwords[200];


if (strcmp(argv[1], "orlando5.txt") == 0)
{

    fp = fopen(argv[1], "r");
    if (fp == NULL) 
    {
        perror("Error opening the file");
        return(-1);
    }

    while (fgets(text, sizeof(text), fp) != NULL) 
    {
        printf("%s", text);
    }
}
else
    printf("File name is incorrect");


fflush(stdout);
fclose(fp);


}

So far i'm able to read the whole file, then output the unstructured input onto the command line.

The next thing I tried to figure out is to extract piece by piece the strings and store them into the variables within the structure. Currently i'm stuck at this phase. I've looked up information on strcpy, and other string library functions, data extraction methods, ETL, I'm just not sure what function to use properly within my code.

I've done something very similar to this in java using substrings, and if there is a way to take a substring of the massive string of text, and set parameters on what substrings are held in what variable, that would potentially work. such as... LocID is never more than 4 characters long, so anything with a numerical/letter combination that is four letters long can be stored into airPdata.LocID for example.

After the variables are stored within the structures, I know I have to use strtok to organize them within the list under site#, locID...etc.. however, that's my best guess to approach this problem, i'm pretty lost.

Community
  • 1
  • 1
Uiop737
  • 7
  • 4
  • 1
    You can get line by line with `fgets()`, and them break to words with `strtok` . keep in mind that `strtok` change your lines and adds `\0`, if you need the text for futher use. – t.elazari Feb 12 '17 at 22:59
  • In your question you say you know you need to use `strtok`, which is a pretty good start to solving the problem. So what are you stuck on? Just saying your lost doesn't make for much of a question. – Carey Gregory Feb 13 '17 at 00:06
  • I'm not sure if "assignment" means this is an exercise. The practical answer is to import it into SQLite and use SQL. – Schwern Feb 13 '17 at 00:11
  • 1
    If you're going to reject anything but `orlando5.txt` you might as well hard code the filename. Don't make the user play a guessing game. – Schwern Feb 13 '17 at 00:14
  • How is the data in each line formatted? – Code-Apprentice Feb 13 '17 at 00:18
  • What is the format? Since there's spaces in some of the fields ("AIR ORLANDO") it cannot be space separated. It looks like a fixed width? Maybe it's tab separated? – Schwern Feb 13 '17 at 00:19
  • It is tab separated i think?, the words are separated by varying amount of spaces, and the only two fields that are separated by single spaces are "Airport name" and "City" It could be something like Air Orlando as pointed out, or a city name like Fort Myers. at first I was going to use strtok, and separate at a space, that works for the first two fields, however I reach an issue with the airport name – Uiop737 Feb 13 '17 at 00:39
  • @Uiop737 Open it in a text editor that will show you tabs, `less -U` will do that, tabs will show up as `^I`. Or ask for clarification from whomever gave you the assignment. – Schwern Feb 13 '17 at 01:16
  • Confirmed, it's tab delimited, currently trying to use the method listed below – Uiop737 Feb 13 '17 at 03:01
  • the posted code is including `strings.h` however, the correct header is `string.h` – user3629249 Feb 13 '17 at 18:33
  • NEVER access beyond `argv[0]` without first checking `argc` to assure the parameter actually exists (and if it does not exist, display a `usage` message to `stderr` and exit the program. – user3629249 Feb 13 '17 at 18:34
  • for ease of readability and understanding: 1) consistently indent the code. indent after every opening brace '{'. unindent before every closing brace '}'. – user3629249 Feb 13 '17 at 18:36
  • the lack of a consistent delimiter in the source file will be a problem. The lack of a consistent number fields in the airport name will be a problem, The lack of a consistent number of fields in the city name will be a problem. The detail of not always displaying the full airport name will be a problem. The function: `strtok()` can extract field by field, however the lack of consistency in the number of fields for airport name and city name will be very tricky. Strongly suggest modifying the input file for consistent field delimiters, like a comma or colon to make extract easy. – user3629249 Feb 13 '17 at 18:47

1 Answers1

0

I don't know what the format is. It can't be space-separated, some of the fields have spaces in them. It doesn't look fixed-width. Because you mentioned strtok I'm going to assume its tab-separated.

You can use strsep use that. strtok has a lot of problems that strsep solves, but strsep isn't standard C. I'm going to assume this is some assignment requiring standard C, so I'll begrudgingly use strtok.

The basic thing to do is to read each line, and then split it into columns with strtok or strsep.

char line[1024];
while (fgets(line, sizeof(line), fp) != NULL) {
    char *column;
    int col_num = 0;
    for( column = strtok(line, "\t");
         column;
         column = strtok(NULL, "\t") )
    {
        col_num++;

        printf("%d: %s\n", col_num, column);
    }
}
fclose(fp);

strtok is funny. It keeps its own internal state of where it is in the string. The first time you call it, you pass it the string you're looking at. To get the rest of the fields, you call it with NULL and it will keep reading through that string. So that's why there's that funny for loop that looks like its repeating itself.

Global state is dangerous and very error prone. strsep and strtok_r fix this. If you're being told to use strtok, find a better resource to learn from.

Now that we have each column and its position, we can do what we like with it. I'm going to use a switch to choose only the columns we want.

    for( column = strtok(line, "\t");
         column;
         column = strtok(NULL, "\t") )
    {
        col_num++;

        switch( col_num ) {
            case 1:
            case 2:
            case 3:
            case 4:
            case 5:
            case 9:
            case 10:
            case 13:
                printf("%s\t", column);
                break;
            default:
                break;
        }
    }

    puts("");

You can do whatever you like with the columns at this point. You can print them immediately, or put them in a list, or a structure.

Just remember that column is pointing to memory in line and line will be overwritten. If you want to store column, you'll have to copy it first. You can do that with strdup but *sigh* that isn't standard C. strcpy is really easy to use wrong. If you're stuck with standard C, write your own strdup.

char *mystrdup( const char *src ) {
    char *dst = malloc( (sizeof(src) * sizeof(char)) + 1 );
    strcpy( dst, src );
    return dst;
}
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • and remember to check each call to `malloc()` to assure the operation was successful. – user3629249 Feb 13 '17 at 18:50
  • the function: `strdup()` should always be available by `#define`ing the appropriate macro name in your source code (or if using `gcc`, can have the parameter: `-std=gnu99` (of similar) in your compile statement. – user3629249 Feb 13 '17 at 18:54
  • this answer will not be robust, because several of the fields (city name, airport name) can be 1 or 2 fields. – user3629249 Feb 13 '17 at 18:57