-1

I'm working about C++ project on Visual Studio. I have a csv file which looks like:

"0","12312415"," my whole body is tired"
"0","12365448","I just want to rest my ears because of I do not see"
"0",123156984","I want to go to cinema with my girls friend. I am so tired"

So, I want to parse this data without using vector and put into array. Then I will find the common words into the last item of the array. My expected output looks like:

<I> <four times count
<my> <three times count>
<to> <three times count>

Is there any way to do that? I use this code for sorting but I don't know how to convert my code, for reading and putting the data into array.

void heapify(int arr[], int n, int i)
{
    int largest = i; // Initialize largest as root
    int l = 2 * i + 1; // left = 2*i + 1
    int r = 2 * i + 2; // right = 2*i + 2

    // If left child is larger than root
    if (l < n && arr[l] > arr[largest])
        largest = l;

    // If right child is larger than largest so far
    if (r < n && arr[r] > arr[largest])
        largest = r;

    //If largest is not root
    if (largest != i)
    {
        swap(arr[i], arr[largest]);

        //Recursively heapfy the affected sub-tree
        heapify(arr, n, largest);
    }
}

// main function to do heap sort

void heapSort(int arr[], int n)
{

    // Build heap (rearrange array)
    for (int i = n / 2 - 1; i >= 0; i--)
        heapify(arr, n, i);

    //One by one extract an element from heap
    for (int i = n - 1; i >= 0; i--)
    {
        // Move current root to end
        swap(arr[0], arr[i]);

        // Call max heapify on the reduced heap
        heapify(arr, i, 0);
    }
}

// A utility function to print array of size n

void printArray(int arr[], int n)
{
    for (int i = 0; i < n; ++i)
        cout << arr[i] << " ";
    cout << "\n";
}
int main()
{
    clock_t begin = clock();
    int arr[] = { 12,11,13,5,6,7,62,25,27 };
    int n = sizeof(arr) / sizeof(arr[0]);

    heapSort(arr, n);
    cout << "Sorted array is \n";

    printArray(arr, n);

    clock_t end = clock();
    float elapsed_secs = float(end - begin) / CLOCKS_PER_SEC;
    cout << "Time elapsed Time: " << elapsed_secs << " seconds" << endl;

    system("PAUSE");
    return 0;
}
gsamaras
  • 71,951
  • 46
  • 188
  • 305
Daymnn
  • 209
  • 2
  • 10
  • 3
    Umm OK, sounds cool, so, what *is your single, specific, question*? Moreover, what have you tried? :) – gsamaras Dec 07 '18 at 08:07
  • First of all thank you for answer @gsamaras I edited the question. Hope there is no mistake in my quesition. – Daymnn Dec 07 '18 at 08:16
  • Thank you for improving the question Daymnn. It seems that your code got mis-intended when you pasted it here, please *indent* it. *If* you can, mention why you are not happy with your code... For example, edit your question mentioning something along the lines of "it doesn't provide the correct output, but I am getting this and this...", or "it doesn't compiling, and the error is ". Good luck! – gsamaras Dec 07 '18 at 08:19
  • @Daymnn well the objvious first problem is that all the code you've posted is for integer arrays but your actual problem is about string data, what are you intending to do about that? Frankly I'm not sure you wrote the code you have posted, it seems quite advanced for someone who is asking the questions you are. We expect to see posters make at least some effort before they ask for help. – john Dec 07 '18 at 09:37

1 Answers1

1

Since you don't want to use std::vector, which is the recommended way, you should use a 2D array for reading the csv file. The first dimension of the array is the number of lines, and the second dimension is the number of fields. In your case, both dimensions are equal to 3. Check Read csv file using 2d array for this.


Once you got your 2D array ready, you need count the frequency of every word. To do that, you could use a 1D array of pairs, where the first field would be the word, and the second its frequency. You'd do that by looping over the 2D array, getting its last field (the sentence), and split that string (the sentence) by spaces. Then, for every word, you'd check if it's present in the array of pairs, if yes, increase its frequency by one (since you had already seen that word, now you saw it again). If not, insert it to that array, and set its frequency to 1, since you saw that word for the first time.

What is the size of the array of pairs? Since you don't want to use an std::vector, which would automatically take care of growing dynamically, as you would insert elements to it, you need to think about that size.

Since the number of words in the csv file is unknown, you need to think of the maximum number of words the file will have. Make sure that it's a big size, in order to be able to store all the words you'll see. On the other hand, don't set it too big, since you would allocate too much memory, that would go to waste.

After setting the size, you would use a counter, that would hold the actual number of words read. That way, you will know the meaningful size of the array for you, that you would use when you would like to loop over that array, for example, to print it.


Then, you'd sort the array. std::sort is perfect for this, where you should define a function on how to compare the elements of the array you want to sort.

Finally, you would print only the words with a frequency greater than one; these are the common words.


Putting everything together, we get:

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <algorithm>
using namespace std;

// search in the array of pairs for the 'word'. Check only the first 'count' pairs.
int search_pair_in_array(const pair<string, int> wordFr[], const int count, const string word)
{
  for(int i = 0; i < count; ++i)
    if(wordFr[i].first == word)
      return i;
  return -1;
}

// compare function to be used by std::sort
bool pairCmp(const pair<string, int>& wordFr1, const pair<string, int>& wordFr2)
{ 
  return (wordFr1.second > wordFr2.second);
}

int main()
{
  // number of lines and number of tokens per line
  const int N = 3, M = 3;
  // line read from file, 2D array of lines read from line
  string line, lines[N][M];
  ifstream csvFile("myFile.csv");

  if(csvFile.is_open())
  {
    cout << "Successfully opened file"<<endl;

    int i = 0, j = 0;
    // read from 'csvFile', store to 'line', and use comma as the delimeter
    while(getline(csvFile, line, ','))
    {
      //cout << "|" << line << "|" << endl;
      size_t found = line.find("\n");
      if (found != std::string::npos) // if newline was found
      {
        string lastToken = line.substr(0, found);
        string nextLineFirstTOken = line.substr(found + 1);
        lines[i++][j] = lastToken.substr(1, lastToken.size() - 2);
        j = 0;
        if(nextLineFirstTOken != "") // when you read the last token of the last line          
          lines[i][j++] = nextLineFirstTOken.substr(1, nextLineFirstTOken.size() - 2);
      }
      else
      {
        // to not copy the double quotes from first and last character
        lines[i][j++] = line.substr(1, line.size() - 2);
      }
    }

    // for(int i = 0; i < N; ++i)
    // {
    //   for(int j = 0; j < M; ++j)
    //   {
    //     cout << lines[i][j] << " ";
    //   }
    //   cout << endl;
    // }

    // max number of words
    const int W = 100;
    // array of pairs that stores a word and its frequency per cell
    pair<string, int> wordFr[W];
    // number of words (to be updated totally after the for loop)
    int count = 0;
    // for every line of the 2D array
    for(int i = 0; i < N; ++i)
    {
      string word;
      // get the last field (the sentence) of the i-th line
      stringstream ss(lines[i][M - 1]);
      // split sentence to words (implicit delimeter: space)
      // for every word in the sentence, do:
      while (ss >> word)
      {
        //cout << word << " " << search_pair_in_array(wordFr, W, word) << endl;

        // check if word already in array of pairs
        int idx = search_pair_in_array(wordFr, W, word);
        // not found, insert the word in array of pairs, set its frequency to 1 (shown that word for first time)
        if(idx == -1)
          wordFr[count++] = make_pair(word, 1);
        // word found in array of pairs, increase it frequency by one
        else
          wordFr[idx].second++;
      }
    }

    // sort the array 'wordFr', by using 'pairCmp' as the compare function. Notice that we care only for the first 'count' elements of the array.
    sort (wordFr, wordFr + count, pairCmp);

    cout << "Word, Frequency\n";
    for(int i = 0; i < count; ++i)
      if(wordFr[i].second > 1) // print only common words (assuming that a word with frequency > 1 is present in another sentence too)
        cout << wordFr[i].first << ", " << wordFr[i].second << endl;
  }
  return 0;
}

Output:

Successfully opened file
Word, Frequency
I, 4
my, 3
to, 3
want, 2
tired, 2
gsamaras
  • 71,951
  • 46
  • 188
  • 305