1

I'm new here. Trying to do something I think should be easy but can't get to work. I have two files which have just simple data in

FileA

KIC
757137  
892010  
892107  
892738  
892760  
893214  
1026084
1435467
1026180
1026309
1026326
1026473
1027337
1160789
1161447
1161618
1162036
3112152
1163359
1163453
1163621
3123191
1164590

and File B

KICID
1430163
1435467
1725815
2309595
2450729
2837475
2849125
2852862
2865774
2991448
2998253
3112152
3112889
3115178
3123191
�

I'd like to read both files, and then print out the values that are the same, and ignoring titles. In this case I'd get that 1435467 3123191 are in both, and just these would be sent to a new file. so far I have

#include <cmath>
#include <cstdlib>
#include <string>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <ctime>

using namespace std;

// Globals, to allow being called from several functions

// main program

int main() {
    float A, B;

    ifstream inA("FileA"); // input stream
    ifstream inB("FileB"); // second instream
    ofstream outA("OutA.txt"); // output stream

    while (inA >> A) {
        while (inB >> B) {

            if (A == B) {
                outA << A << "\t" << B << endl;
            }
        }
    }
    return 0;
}

And this just produces an empty document OutA I thought this would read a line of FileA, then cycle through FileB until it found a match, send to OutA, and then move onto the next line of FileA Any help would be appreciated?

NetVipeC
  • 4,402
  • 1
  • 17
  • 19
  • How large are the files? Would it be an option to read both of them completely into memory? What do you mean by "ignoring titles"? – Codor Sep 11 '14 at 13:25
  • 3
    You need to reset `inB` to the start of the file for each `A`. And skip over the titles before you start reading numbers. – molbdnilo Sep 11 '14 at 13:27
  • 1
    use `inB.seekg(0, std::ios_base::beg);` to reset the file pointer to the begining of the file every time you would like to match a number. Or much better you could read the data of one file in a structure (eg: `std::set`) and read the second trying to match if exist or not. In this case you only need to read the files (both one time). Disk access is a really expensive operation. – NetVipeC Sep 11 '14 at 13:32

4 Answers4

1

You need to put

inB.seekg(0, inB.beg)

to the end of the outer while loop. Else you will stay at the end of inB and will read nothing after processing of the first entry of inA

  • OK I tried this and currently still doesn't produce a result. – Thomas North Sep 11 '14 at 13:46
  • @ThomasNorth Maybe I wrote it misunderstandable. You should place it inside the outer while loop, and after the inner while loop. –  Sep 11 '14 at 14:04
  • :) I guessed that part. I've also changed the type to strings as that's actually what I need for the project. – Thomas North Sep 11 '14 at 14:10
1

Another problem may be that you are using float for A and B. Try int (or string), as float may not behave as you expect with ==. Refer to this question for details: What is the most effective way for float and double comparison?.

This code worked on my platform:

...
while (inA >> A) {
  inB.clear();
  inB.seekg(0, inB.beg);
  while (inB >> B) {
    if (A == B) {
      outA << A << "\t" << B << endl;
    }
  }
}

Notice the inB.clear() and inB.seekg(...), A and B are strings.

By the way, this method only good for quick-and-dirty implementation, it's not optimal for big files, as you get N * M complexity (N - size of FileA, M - size of FileB). By using hash set you may get to nearly linear (N + M) complexity.

Example of hash set implementation (C++11):

#include <string>
#include <iostream>
#include <fstream>
#include <unordered_set>

using namespace std;

int main() {
  string A, B;

  ifstream inA("FileA"); // input stream
  ifstream inB("FileB"); // second instream
  ofstream outA("OutA.txt"); // output stream

  unordered_set<string> setA;

  while (inA >> A) {
    setA.insert(A);
  }

  while (inB >> B) {
    if (setA.count(B)) {
      outA << A << "\t" << B << endl;
    }
  }

  return 0;
}
Community
  • 1
  • 1
dragn
  • 1,030
  • 1
  • 8
  • 21
1

Are both the files small enough to read into memory?

You could try something similar to the following:

int main(int argc, char**argv)
{
    std::vector<std::string> a;
    std::vector<std::string> b;

    ofstream outA("OutA.txt"); // output stream
    ifstream inA("FileA"); // input stream
    ifstream inB("FileB"); // second instream

    std::string value;

    inA >> value;                        //read first line (and don't use - discarding header)
    while (inA >> A) { a.push_back(A);}  //populate first vector
    inB >> value;                        //read first line (and don't use - discarding header)
    while (inB >> B) { b.push_back(B);}  //populate first vector

    //std::sort will perform a pretty efficient sort
    std::sort(a.begin(),a.end());
    std::sort(b.begin(),b.end());

    //now that it is sorted, comparing is easier
    for (std::vector<std::string>::iterator ita=a.begin(), std::vector<std::string>::iterator itb=b.begin(); ita!=a.end(), itb!=b.end();)
    {
        if(*ita > *itb)
            itb++;
        else if(*ita < *itb)
            ita++;
        else
            outA << *ita <<'\n';
    }
    return 0;
}

Reads both files into memory, sorts them both, and then compares them. The comparison only has to go through each file once, which reduces the complexity immensely O(a+b) instead of O(a*b). Of course the sorting will have an overhead, but this should be more efficient for larger files, and for shorter files it should be sufficiently fast still. (unless comparing lots and lots (and lots) of small files). I believe with std::sort the worst case for all this is O(aloga + blogb) which is better than O(a*b)

Baldrickk
  • 4,291
  • 1
  • 15
  • 27
  • @Thomas North Due to your comments in response to [this answer](http://stackoverflow.com/a/25789277/4022608) I've updated this to handle `std::string` instead of `int` – Baldrickk Sep 11 '14 at 14:45
0

In the end I fixed it like so

#include <cmath>
#include <cstdlib>
#include <string>
#include <iomanip>
#include <iostream>
#include <fstream>
#include <ctime>

using namespace std;

//Globals, to allow being called from several functions


//main program

int main() {
string A, B;

    ifstream inA("FileA.txt"); //input stream
    ifstream inB("FileB.txt") ;//second instream 
    ofstream outA("OutA.txt"); //output stream

while(inA>>A){//take in first stream
        while(inB>>B){//whilst thats happening take in second stream

                if (A==B){//do they match? If so then send out the value 
                    outA<<A<<"\t"<<B<<endl; //THIS IS JUST SHOW A DOES = B!
                }

                    }//end of B loop
            inB.clear();//now clear the second stream (B)
            inB.seekg(0, inB.beg);//return to start of stream B
    }//move onto second input in stream A, and repeat
return 0;
}