0

We are scraping a csv file using c++ and we would like to know how we would get specific lines that consists of a certain location.

Here is an screenshot of how the csv file is structured.

void readFileVector(std:: string fileName, std:: vector <std::string>
& fileVector)
{

    // search key

    std::string key = "Back Bay";

    std::ifstream file(fileName.c_str()); 

    if (!file)
    {
    
        std::cerr<< "File could not be opened: "
        <<fileName<<std::endl;

    }

    std::string line;


    if (file.is_open()&& file.good()){
    
         while(getline(file,line)){
   /*
         
         Need to implement a condtion to only push data pertaining to Back Bay
         */
   
            fileVector.push_back(line);
        }
        
    file.close();
    }
}

So far we've only been able to store all the values within a vector but we still need to find out how to scrape only the "Back Bay" location ones.

Essentially we would just like to be able to print out all the occurrences of that certain location and push that into a bunch of vectors. If there's an easier way to do this, it would be much appreciated.

kiner_shah
  • 3,939
  • 7
  • 23
  • 37
Anon
  • 35
  • 6
  • 1
    Please edit your code into the question as text. The answer is that you read the csv file like you would any other and apply logic to decide if you keep or discard the data. It isn't clear which part of this you're having trouble with. – Retired Ninja Nov 08 '21 at 06:42
  • If location is always a 5th value on a line, then just check it as you parse – Alexey S. Larionov Nov 08 '21 at 06:42
  • Does this answer your question? [Parsing simple csv table with boost-spirit](https://stackoverflow.com/questions/44042635/parsing-simple-csv-table-with-boost-spirit) – francesco Nov 08 '21 at 06:42
  • 1
    Since you just want to check if "Back Bay" is present on the line, just use `std::string::find` method for each line. – kiner_shah Nov 08 '21 at 07:08
  • [Why not upload images of text when asking a question?](https://meta.stackoverflow.com/q/285551/995714) – phuclv Nov 08 '21 at 07:16
  • Just use [std::basic_string::find](https://en.cppreference.com/w/cpp/string/basic_string/find) to check whether `"Back Bay"` is contained in the line, and if so, add it to your vector. – David C. Rankin Nov 08 '21 at 08:27
  • @francesco. No complicated boost spirit needed. Splitting one line into parts can be easily done with the ````std::sregex_token_iterator```` with one statement. But even better: No splitting needed. A simple ````find```` will do the job . . . – A M Nov 08 '21 at 13:54

1 Answers1

0

Basically you have already everything.

You want to push_back a line into a std::vector if a certain condition is met. The condition is that we can find a key string in the line.

This we can do with a simple if statement:

if(line.find(key) != std::string::npos)

And, the new function would would like the below (I added some comments):

void readFileVector(std::string& fileName, std::vector <std::string>& fileVector)
{
    // Open input file
    std::ifstream file(fileName);

    // Check, if it could be opened.
    if (!file) {
        //if not, then show error message
        std::cerr << "File could not be opened: " << fileName << std::endl;
    }
    else {
        // File could be opened. Now we want to read line by line
        std::string line{};

        // Read the complete file
        while (getline(file, line)) {

            // Search for the key. If we find it, then we can add the line to the resulting vector
            if(line.find(key) != std::string::npos)
                fileVector.push_back(line);
        }
    }
}

But, my guess is that you want to split the lines into parts and then do some comparisons.

For splitting a text lines into single parts, also called tokens, there are many possible solutions. I will show you 4 examples:

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <regex>
#include <algorithm>
#include <iterator>
#include <cstring>
#include <forward_list>
#include <deque>

using Container = std::vector<std::string>;
std::regex delimiter{ "," };


int main() {

    // Some function to print the contents of an STL container
    auto print = [](const auto& container) -> void { std::copy(container.begin(), container.end(),
        std::ostream_iterator<std::decay<decltype(*container.begin())>::type>(std::cout, " ")); std::cout << '\n'; };

    // Example 1:   Handcrafted -------------------------------------------------------------------------
    {
        // Our string that we want to split
        std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
        Container c{};

        // Search for comma, then take the part and add to the result
        for (size_t i{ 0U }, startpos{ 0U }; i <= stringToSplit.size(); ++i) {

            // So, if there is a comma or the end of the string
            if ((stringToSplit[i] == ',') || (i == (stringToSplit.size()))) {

                // Copy substring
                c.push_back(stringToSplit.substr(startpos, i - startpos));
                startpos = i + 1;
            }
        }
        print(c);
    }

    // Example 2:   Using very old strtok function ----------------------------------------------------------
    {
        // Our string that we want to split
        std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
        Container c{};

        // Split string into parts in a simple for loop
#pragma warning(suppress : 4996)
        for (char* token = std::strtok(const_cast<char*>(stringToSplit.data()), ","); token != nullptr; token = std::strtok(nullptr, ",")) {
            c.push_back(token);
        }

        print(c);
    }

    // Example 3:   Very often used std::getline with additional istringstream ------------------------------------------------
    {
        // Our string that we want to split
        std::string stringToSplit{ "aaa,bbb,ccc,ddd" };
        Container c{};

        // Put string in an std::istringstream
        std::istringstream iss{ stringToSplit };

        // Extract string parts in simple for loop
        for (std::string part{}; std::getline(iss, part, ','); c.push_back(part))
            ;

        print(c);
    }

    // Example 4:   Most flexible iterator solution  ------------------------------------------------

    {
        // Our string that we want to split
        std::string stringToSplit{ "aaa,bbb,ccc,ddd" };


        Container c(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});
        //
        // Everything done already with range constructor. No additional code needed.
        //

        print(c);


        // Works also with other containers in the same way
        std::forward_list<std::string> c2(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {});

        print(c2);

        // And works with algorithms
        std::deque<std::string> c3{};
        std::copy(std::sregex_token_iterator(stringToSplit.begin(), stringToSplit.end(), delimiter, -1), {}, std::back_inserter(c3));

        print(c3);
    }
    return 0;
}

And, with this additional know how, we can come up with a solution that first slits the line, and then compare the specific part of the string to a key.

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <regex>

// Save some typing work and create an alias
using Iter = std::sregex_token_iterator;

const std::string key{ "Back Bay" };
const std::regex separator{ "," };

void readFileVector(std::string& fileName, std::vector <std::string>& fileVector)
{
    // Open input file
    std::ifstream file(fileName);

    // Check, if it could be opened.
    if (!file) {
        //if not, then show error message
        std::cerr << "File could not be opened: " << fileName << std::endl;
    }
    else {
        // File could be opened. Now we want to read line by line
        std::string line{};

        // Read the complete file
        while (getline(file, line)) {

            // Split string into parts
            std::vector part(Iter(line.begin(), line.end(), separator, -1), {});

            // Now the condition
            if ((part.size() > 4) and (part[4] == key))
                fileVector.push_back(line);
        }
    }
}
A M
  • 14,694
  • 5
  • 19
  • 44