-1

I need to open a file, then count the number of time a certain sequence appears in the file, with space being ignore. The file name and the sequence are entered by the using through the command line. Here's my approach: I open the file, then store the content to an array, then remove all the space from that array and store it to another array. Then, I search for sequence and count the number of times it appear. This is my code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void main (int argc, char *argv[])
{
char *tempRaw;
char *temp;
int size;
//Input check
if(argc != 3) 
{
fprintf(stderr, "Usage: %s Input Search\n", argv[0]);
exit(1);
}
//Open files
FILE *input = fopen(argv[1],"r");
//Check for file
if(input == NULL) 
{
    fprintf(stderr, "Unable to open file: %s\n", argv[1]);
    exit(1);
}
//Get the file size
fseek (input,0,SEEK_END);
size = ftell(input);
rewind(input);
//Allocate memory for the strings
tempRaw = (char*) malloc(sizeof(char)*size);
temp = (char*) malloc(sizeof(char)*size);

//Copy the file's content to the string
int result =0;
int i;
fread(tempRaw,sizeof(char),size,input);
//Remove the blanks
removeBlanks(temp,tempRaw);
fclose(input);

char *pointer;
//Search for the sequence
pointer = strchr(pointer,argv[2]);
// If the sequence is not found
if (pointer == NULL)
{
    printf("%s appears 0 time",argv[2]);
    return;
}
else if (pointer != NULL)
{
    //Increment result if found
    result ++;
}
while (pointer != NULL)
{
    //Search the next character
    pointer = strchr(pointer+1,argv[2]);
    //Increment result if the sequence is found
    if (pointer != NULL)
    {
        result ++;
    }
    //If the result is not found, pointer turn to NULL the the loop is break 
}

printf(" Sequence : %s\n",temp);
printf("%s appears %d time(s)\n",argv[2],result);
}

void removeBlanks( char *dest, const char *src)
{
//Copy source to destination
strcpy(dest,src);
char *old = dest;
char *new = old;
//Remove all the space from destination
while (*old != '\0') 
{
    // If it's not a space, transfer and increment new.

    if (*old != ' ')
    {
        *new++ = *old;
    }
    // Increment old no matter what.

    old++;
}

// Terminate the new string.

*new = '\0';

}

I tested it, and I'm having problem with getting the content from the file. Sometimes it works, but most of the time, all I got is an empty string.

Nguyễn Duy
  • 91
  • 1
  • 8
  • That indentation doesn't help readability... also, [don't cast the return value of `malloc()`](http://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc/605858#605858), and `sizeof(char)` is always 1, so it's redundant. –  Nov 10 '13 at 07:53

1 Answers1

1

There are a few problems with your code and the compiler should have given you warnings about them (and don't ignore the compiler):

First functions should be declared, not just defined so add:

void removeBlanks( char *dest, const char *src);

before main. According to the C99 standard (5.1.2.2.1 Program startup) main should be declared with a return value like int main(int argc, char *argv[]) and you should add the appropriate returnstatements.

And like pointed out above casting malloc is not needed.

The problems above isn't why it isn't working however... it because you use the strchr function on the wrong variable and in the wrong way:

pointer = strchr(pointer,argv[2]);

should be

pointer = strchr(temp, *argv[2]);

because tempis the pointer to the contents you read from the file and strchrneeds a charas the second argument, not a char *. If you want to search for a string you would have to use strstr and that takes a char * like:

pointer = strstr(temp, argv[2]);

Also, since you remove the blanks from tempRawand store the new string in temp the second string will be shorter and will get garbage at the end so you should initialize the memory like:

tempRaw = calloc(1, size);

There might be other errors too, but these changes made it work for me...

Community
  • 1
  • 1
jpw
  • 44,361
  • 6
  • 66
  • 86
  • the `calloc` is not necessary, `removeBlanks()` is null-terminating the output array. – Edward Clements Nov 10 '13 at 08:35
  • @EdwardClements Hmm, it should but I get garbage at the end of `printf("Sequence:\n%s\n\n",temp);`if I don't use calloc. – jpw Nov 10 '13 at 08:41
  • Still not working on some cases though. for example: If I'm searching for GT in the file contain "GAGAGAGAGAGAAAAAAGGGGGTTAATATATTTTGATAC". I got 12 instead of 1. Trying to go through my code to see what's wrong. Any suggestion? – Nguyễn Duy Nov 10 '13 at 08:44
  • also need the changes in the other original answer: `size = ftell(input) + 1;` and `tempRaw[size - 1] = '\0';` to null-terminate the contents of the file – Edward Clements Nov 10 '13 at 08:45
  • @NguyễnDuy Read what I added about `strchr`and `strstr`. You want to use `strstr`. When using `strchr`you count the number of `G`not `GT`, for that change it to use `strstr`. – jpw Nov 10 '13 at 08:46
  • @jpw : It stills not work on some cases. I tried to search for AA in." TTCCTCTTTCTCGACTCCATCTTCGCGGTAGCTGGGACCGCCGTTCAGTC GCCAATATGCAGCTCTTTGTCCGCGCCCAGGAGCTACACACCTTCGAGGT GACCGGCCAGGAAACGGTCGCCCAGATCAAGGCTCATGTAGCCTCACTGG AGGGCATTGCCCCGGAAGATCAAGTCGTGCTCCTGGCAGGCGCGCCCCTG GAGGATGAGGCCACTCTGGGCCAGTGCGGGGTGGAGGCCCTGACTACCCT GGAAGTAGCAGGCCGCATGCTTGGAGGTAAAGTTCATGGTTCCCTGGCCC GTGCTGGAAAAGTGAGAGGTCAGACTCCTAAGGTGGCCAAACAGGAGAAG AAGAAGAAGAAGACAGGTCGGGCTAAGCGGCGGATGCAGTACAACCGGCG CTTTGTCAACGTTGTGCCCACCTTTGGCAAGAAGAAGGGCCCCAATGCCA ACTCTTAAGTCTTTTGTAATTCTGGCTTTCTCTAATAAAAAAGCCACTTA GTTCAGTCAAAAAAAAAA" – Nguyễn Duy Nov 10 '13 at 09:48
  • And got 44 instead of 45 :( – Nguyễn Duy Nov 10 '13 at 09:49
  • Do you think it's because of those cases when the sequence is like AAA. Which should be 2 instead of 1? – Nguyễn Duy Nov 10 '13 at 09:53
  • @NguyễnDuy No, I think it's because the file string `temp`is one char too short. Try adding a white space at the end of the file and it will be correct. To fix this I think you need to include the changes suggested by Edward Clements in the comment above. – jpw Nov 10 '13 at 09:55
  • @jpw You mind checking the file out? It still not working though :( http://www.mediafire.com/?43sgdo4i1q8qu5b – Nguyễn Duy Nov 10 '13 at 10:11
  • @NguyễnDuy When I ran the code you linked to with the data above I got 45 hits for AA; I don't know why you get 44 :\ – jpw Nov 10 '13 at 10:21
  • @jpw whattttttttttttt :( Can you try the same sequence but searching for GA then? I got 29, it should be 31. – Nguyễn Duy Nov 10 '13 at 10:23
  • @jpw I know what the problem is now, but I dont know how to fix it. In my test file, the sequence is laid out like this: "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAA(end line) AAAAAAAAAAAAAAAAAA(end line) " And when I put it all on a line, it got 45. Any idea on how to fix that? Try using this file and you'll see what I mean. http://www.mediafire.com/download/tcjsy15qy199za8/1.txt – Nguyễn Duy Nov 10 '13 at 10:38
  • @NguyễnDuy You need to remove the newlines. You could change `(*old != ' ')`to `(isalpha(*old))` in the removeBlanks function. – jpw Nov 10 '13 at 12:27
  • @jpw +1 for your patience! – Edward Clements Nov 10 '13 at 12:47