1

I have N number of text files. I am trying to read from those files parallelly, so I have forked N threads and each thread gets one text file from those N files(Thread 0 gets file0.txt, thread 1 gets file1.txt, thread 2 gets file2.txt and so on ). The next step I am trying to do is read first 1000 lines from the files parallelly and put in a vector say V1, then I will sort V1. All the resources are private to each tread except V1. But after sorting I am getting different output every time. My code is given below. The value of N is 395. Thanks in advance.

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <vector>
#include <iostream>
#include <stdlib.h>
#include <stdint.h>
#include <limits.h>
#include <algorithm>
#include <fstream>
#include <string.h>
#include <math.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/io.h>
#include <sys/mman.h>
#include <bits/stdc++.h>
#include <ctype.h>      
#include <algorithm>
#include <omp.h>

typedef struct Data
{

    int  POS;
    char Record[600];
};

using namespace std;

std::vector<Data> sortDataOnDevice(std::vector<Data>,int);

struct comparator_sort
{
    __device__

    bool operator() (const Data &x, const Data& y)
    {
        return x.POS < y.POS;
    }
};

int sortedFilePos;

#pragma omp threadprivate (sortedFilePos)

void externalSorting(void)
{
    string tempString;
    int count = 0;
    stringstream s;
    std::vector<string> result;
    std::vector<Data>   V1;
    std::vector<Data>  finalSorted;
    std::vector<Data> temp1;
    string line;
    string word;
    
    int fp1 = 0;
    omp_set_num_threads(395);
    outfile.open("/ExtrSort/Sorted.txt");
    int firstTrav = 0;
        #pragma omp parallel for private(tempString,line,word,firstTrav) shared(V1)
            for(int i=0;i<395;i++ )
            {
                    int vecIdx = 0;
                    int beg = 0, end = 0;
                    int chrNo;
                    int tid;
                    
                    int fileReadIdx = 0;
                    stringstream intToString;
                    intToString << i;
                    intToString >> tempString;
                    std::string name = "/ExtrSort/Chr_Iteration" + tempString + ".txt";
    

                    cout<<name<<"     "<<omp_get_thread_num()<<endl;
                    
                    std::ifstream sortedFile(name.c_str());

                    if(!sortedFile)
                        cout<<"Fie openning error"<<endl;

                    
                        for(int n=0;n<1000;n++)
                        {

                            int wordCount = 0;

                            int chrFlag = 0;


                            Data *sd = (Data*)malloc(sizeof(Data));
                            
                            getline(sortedFile, line);
                        
                            stringstream s(line);
                            while(s>>word)
                            {
                                wordCount++;
                                if (wordCount == 4)
                                {

                                    strcpy(sd->wholeRecord,line.c_str());
                                    s >> word;
                                    stringstream geek(word);
                                    geek>>sd->POS;
                                    geek.clear();

                                    if(n==999)
                                    {

                                        int fp = sortedFile.tellg();


                                        sortedFilePos = fp;

                                    }

                                    break;
                                }
                            }
                            #pragma omp critical//(fill)
                            {
                                V1.push_back(*sd);
                                free(sd);
                                sd = NULL;
                            }


                        }//for(int n=0;n<1000;n++)
            }
            cout<<file1.size()<<endl;

            finalSorted = sortDataOnDevice(V1,0);

            cout<<finalSorted[0].wholeRecord<<endl;

            V1.clear();

            outfile.close();
        
        
}

std::vector<Data> sortDataOnDevice(std::vector<SAMData> dataX, int composite)
{

    thrust::device_vector<Data> dDataX(dataX.size());
    thrust::copy(dataX.begin(),dataX.end(),dDataX.begin());



    cout<<"Copied"<<endl;

    thrust::stable_sort(dDataX.begin(),dDataX.end(),comparator_sort());

    cout<<"Sorted"<<endl;

    std::vector<Data> sortedDataHost(dDataX.size());
    thrust::copy(dDataX.begin(),dDataX.end(),sortedDataHost.begin());

    dataX.clear();
    dDataX.clear();

    cout<<"After sort"<<endl<<endl<<endl;

    return sortedDataHost;
    
}
somu_mtech
  • 107
  • 1
  • 3
  • 10
  • 1
    First of all, don't use `malloc` in a C++ program. In fact, every time you must use a C-style cast in C++, you should take that as a sign that you're doing something wrong. In the case of `sd` there's not even any need to allocate the object dynamically. Or use a character array instead of `std::string`. You also include a lot of header files are aren't really needed. And one header file [which you should never include](https://stackoverflow.com/questions/31816095/why-should-i-not-include-bits-stdc-h). – Some programmer dude May 11 '22 at 11:04
  • 1
    @Someprogrammerdude I don't agree with the statement: "First of all, don't use malloc in a C++ program." For performance purposes malloc usually outperforms operator new! Also it supports realloc but there is no "renew". But I agree that malloc can be really dangerous in a C++ program (ie. array of objects with nontrivial destructors) I think that statement should be "Only use malloc if you really need it, and then be careful!". Malloc is not an evil!! And since it returns a void*, you can use static_cast on that, and not a C-style stupidly dangerous cast which is a NEVER! in C++... – simre May 11 '22 at 11:28
  • 3
    Reading files concurrently is rarely beneficial; the filesystem is non-parallel by nature. And with 395 threads, they would be spending most of their time waiting even if they weren't synchronized. – molbdnilo May 11 '22 at 11:28
  • @molbdnilo In fact the program is reading the files parallelly, but after sorting the V1 vector, every time I am getting a different result. Is there any race condition in the code , but only one resource that is V1 is shared and it is under #pragma omp critical. I am not getting why I am getting different results in every run . – somu_mtech May 11 '22 at 11:37
  • @simre You'd have to prove to me that `malloc` is more efficient. – Victor Eijkhout May 11 '22 at 12:55
  • 1
    New uses malloc under the hood. You just get rid of some of the extra boilerplate... Which is usually really thin, but you can create your own benchmark and you will see that there is a difference. :) Also consider realloc which is amazing!!! – simre May 11 '22 at 13:04
  • @simre First this code does not need memory allocation at all a simple automatic variable would serve the purpose. 2nd even when allocation is needed, using `malloc` in C++ code is starting to introduce bugs. It does not construct the allocated object and is noway faster or superior to `new`. It just boosts your chances for getting a UB. 3rd An implementation is free to call `malloc` inside `operator new` or not. But it can always be overloaded in different ways. – Red.Wave May 11 '22 at 14:30
  • @Red.Wave Most of the implementations uses malloc under the hood!!! Btw I hate when people are: 1. can't read and understand the other, 2. when people are too lazy to benchmark and tells the bullshit as the only truth... https://quick-bench.com/q/YT8XAgoRciC98jaKtFhBTawmwws You can do it on your system, and probably you will see similar results! And if you need realloc on trivially copyable types, new-delete will be just garbage compared to realloc in most of the times if you have enough space after the current block in the memory... If not, it'll more-or less the same. – simre May 11 '22 at 15:07
  • @Red.Wave Sry, ignore that quick-bench link (out of 5 mins edit), i forgot to overwrite the array sizes. This is more impressive: https://quick-bench.com/q/xTWaL7dprE4z0-TbTjVIREXAi1k ;) – simre May 11 '22 at 15:15
  • I can use new instead of malloc. But that will not solve the problem that I am dealing with. Please give some light on the problem that I am dealing with. – somu_mtech May 11 '22 at 15:52
  • Do you *have* a problem? You have never said you have a problem, all you do is describe the code you have, it seems. There's no description of the expected and actual behavior in the question itself. There's no actual question inside the question. And seemingly no attempt at *debugging*. Please take some time to refresh [the help pages](http://stackoverflow.com/help), take the SO [tour], read [ask], as well as [this question checklist](https://codeblog.jonskeet.uk/2012/11/24/stack-overflow-question-checklist/). And don't forget how to [edit] your question. – Some programmer dude May 12 '22 at 05:22
  • @Someprogrammerdude "Why I am getting different output in every run in openmp" This was the question. "I have N number of text files. I am trying to read from those files parallelly...." this was the description of what I have done and the problem I am facing. Hope you can read now. – somu_mtech May 12 '22 at 05:50
  • The title should be short summary of the problem or question. The main description and question should be inside the question body itself. – Some programmer dude May 12 '22 at 05:55
  • @Someprogrammerdude To be precise, it seems like that the code is having some race conditions which I am unable to figure out and that is may be the reason why I am getting a different output in every run. Can you please try to figure out that. – somu_mtech May 12 '22 at 07:00
  • The code seems to be rather complex. Adding threading adds another layer of complexity. To help debugging it might be a good idea to attempt to simplify the code, perhaps even start over. As a general tip, don't write large parts of code without testing, and don't write large complex functions. Write small and simple functions, one by one, where each can be easily tested. Build with lots of extra warnings enabled, and treat them as errors. Only continue with the next function or feature once the whole program builds cleanly and all tests passes. – Some programmer dude May 12 '22 at 07:11
  • [Continued] If a test fails, then it should be easy to roll back since you only added little code, and it should also be easier to debug that little piece of code, and possibly also break it out into a separate test program for further testing and debugging. – Some programmer dude May 12 '22 at 07:12

1 Answers1

1

I'm not sure what the original code looks like, because the sample presented has several dozen errors in it from missing variables, missing includes, and references to non-existent struct fields. Here's a simplified version with what I think is the fix in it. The OpenMP pragmas seem to be basically correct, but there's no check to see if the record actually got populated in the while loop or not before adding it to the vector. Due to the way uninitialized data behaves, note that the use of malloc while certainly ok in C++ means the object is created in an inconsistent uninitialized state, you are probably getting different slices of garbage data sent into the vector as a result.

The version below has these main differences:

  1. Data is declared correctly for C++ (typedef requires a name after the declaration, C++ provides the typedef by default).
  2. POS is initialized to -1 on construction of a Data struct, this syntax requires C++11 and the change in declaration of sd below.
  3. Rather than privatizing variables, I've moved their declarations to the appropriate scopes so they are naturally scoped to each thread. This is better for performance, and in my opinion for readability, but is not a requirement.
  4. sd is declared on the stack rather than as a pointer to a malloced buffer. If it must be allocated with malloc, then it must be initialized with placement new to get correct behavior.
  5. The various stringstreams have been removed in favor of directly using std::to_string on ints and directly reading values as integers where appropriate.
  6. The key change, the record is only added to the vector if POS has been set to a new value after the while loop.

Since this is an incomplete example, I can't be sure this will fix the problem, but the code should be easier to reason about with these changes applied.

#include <ctype.h>
#include <fstream>
#include <iostream>
#include <limits.h>
#include <math.h>
#include <omp.h>
#include <sstream>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <vector>

struct Data {

  int POS = -1;
  string Record;
};

using namespace std;

std::vector<Data> sortDataOnDevice(std::vector<Data>, int);

struct comparator_sort {
  bool operator()(const Data& x, const Data& y) { return x.POS < y.POS; }
};

int sortedFilePos;

#pragma omp threadprivate(sortedFilePos)

void externalSorting(void)
{
  std::ofstream outfile;
  std::vector<string> result;
  std::vector<Data> V1;
  std::vector<Data> finalSorted;
  std::vector<Data> temp1;

  int fp1 = 0;
  omp_set_num_threads(395);
  outfile.open("/ExtrSort/Sorted.txt");
#pragma omp parallel for shared(V1) default(none)
  for (int i = 0; i < 395; i++) {

    std::string name = "/ExtrSort/Chr_Iteration" + std::to_string(i) + ".txt";

    cout << name << "     " << omp_get_thread_num() << endl;

    std::ifstream sortedFile(name);

    if (!sortedFile) {
      cout << "Fie openning error" << endl;
    }

    for (int n = 0; n < 1000; n++) {
      Data sd {}; // create and initialize on the stack
      int wordCount = 0;
      int chrFlag = 0;
      string line;
      getline(sortedFile, line);

      string word;
      istringstream s(line);
      while (s >> word) {
        wordCount++;
        if (wordCount == 4) {
          sd.Record = line;
          s >> sd.POS;
          if (n == 999) {
            int fp = sortedFile.tellg();
            sortedFilePos = fp;
          }
          break;
        }
      }
      if (POS != -1) { // KEY CHANGE
#pragma omp critical //(fill)
        {
          V1.push_back(sd);
        }
      }
    } // for(int n=0;n<1000;n++)
  }
  /* cout << file1.size() << endl; */

  finalSorted = sortDataOnDevice(V1, 0);

  cout << finalSorted[0].Record << endl;

  V1.clear();

  outfile.close();
}

If for some reason you need the original structure, the only required changes should be a check for whether the record is valid, and either replacing the malloc with new or after allocating the memory using placement new to initialize the record before using it.

Tom Scogland
  • 937
  • 5
  • 12
  • Thank you so much for the detailed and clear answer. I am trying as suggested. But I could see the same code as I posted. – somu_mtech May 13 '22 at 18:41
  • Right you are, that’s embarrassing as I must have made a mistake pasting it somehow. I’ll find the original and edit the post momentarily. – Tom Scogland May 14 '22 at 15:25
  • There, this is the final version, apologies for the misfire. – Tom Scogland May 14 '22 at 15:31
  • Thank you so much. Just a question , will the line V1.push_back(sd); be thread safe ? I just gone through this link https://stackoverflow.com/questions/9269097/openmp-and-stl-vector where it is saying that "It becomes not-thread-safe when you start calling methods like push_back(), pop_back(), insert(), etc... from multiple threads." – somu_mtech May 14 '22 at 19:38
  • It itself is not thread safe, but the critical around that operation makes it thread safe because only one thread can execute that region at a time (which is why the critical is inside the conditional, so threads that don’t need to block don’t block). If you need higher throughput, you could re-factor this as a reduction, but that’s more complicated and another question. – Tom Scogland May 15 '22 at 17:09
  • Thank you so much, my problem has been solved. – somu_mtech May 17 '22 at 06:18