Why is my program still slow after using threading?

Question

I am working on one of my assignments that involves a HashTable. I ran into an issue when reading data from a .csv file. The file contains Covid-19 data from the WHO website (300K lines+). My original code which read the file took around 2 minutes to read the file so I decided to use threading inorder to speed up the process. I set num_threads to std::thread::hardware_concurrency(); which is 64 on my laptop (MacBook Pro 2021 M2 chip). After executing the code can still see some delay (6 seconds) which is definetly faster than my first solution.

Is there a way to speed it up even more? Is there a different way to approach this problem?

I don't want to assign a hardcoded value to num_threads because in the end the TA's will be running the program and I don't know what laptop they're using.

Code:

void process_chunk(std::vector<std::string> chunk, CovidDB* db, std::mutex* mtx) {
    std::string latest_date_str = "01/01/00"; // initialize to an old date
    std::tm latest_date = {};
    std::istringstream iss(latest_date_str);
    iss >> std::get_time(&latest_date, "%m/%d/%y");
    for (auto line : chunk) {
        std::stringstream ss(line);
        std::string country, date_str, cases_str, deaths_str;
        std::getline(ss, date_str, ',');
        std::getline(ss, country, ',');
        std::getline(ss, cases_str, ',');
        std::getline(ss, deaths_str, ',');

        int cases = std::stoi(cases_str);
        int deaths = std::stoi(deaths_str);

        std::tm entry_date = {};
        std::istringstream iss2(date_str);
        iss2 >> std::get_time(&entry_date, "%m/%d/%y");

        if (mktime(&entry_date) > mktime(&latest_date)) {
            latest_date_str = date_str;
            latest_date = entry_date;
        } 

        DataEntry* entry = new DataEntry();
        entry->set_country(country);
        entry->set_date(latest_date_str);
        entry->set_c_cases(cases);
        entry->set_c_deaths(deaths);

        std::lock_guard<std::mutex> lock(*mtx);
        db->add(entry);
    }
}

void CovidDB::add_covid_data(std::string const COVID_FILE) {
    std::ifstream file(COVID_FILE);

    if (!file) {
        std::cout << "\n[File ERROR]\n " << COVID_FILE << std::endl;
        std::exit(EXIT_FAILURE);
    }

    std::string line;
    std::getline(file, line); // skip header line

    std::string latest_date_str = "01/01/00"; // initialize to an old date
    std::tm latest_date = {};
    std::istringstream iss(latest_date_str);
    iss >> std::get_time(&latest_date, "%m/%d/%y");

    const int num_threads = std::thread::hardware_concurrency();
    std::vector<std::vector<std::string>> chunks(num_threads);

    int i = 0;
    while (std::getline(file, line)) {
        chunks[i % num_threads].push_back(line);
        i++;
    }

    file.close();

    std::vector<std::thread> threads;
    std::mutex mtx;

    for (auto chunk : chunks) {
        threads.emplace_back(process_chunk, chunk, this, &mtx);
    }

    for (auto& thread : threads) {
        thread.join();
    }
}

MakeFile:

CXX = g++
CXXFLAGS = -std=c++11 -pthread -g -Wall -Wextra -Werror -pedantic -Wno-unused-parameter -Wno-return-type -Wno-unused-variable
LDFLAGS = -pthread

all: main

main: CovidDB.o main.o
    $(CXX) $(CXXFLAGS) -o $@ $^

CovidDB.o: CovidDB.cpp CovidDB.h
    $(CXX) $(CXXFLAGS) -c $<

main.o: main.cpp CovidDB.h
    $(CXX) $(CXXFLAGS) -c $<

clean:
    rm -f main *.o

I tried increassing the amout of threads, I expected it to run faster but nothing really changed

64 threads on an M1 seems really high. Doesn't that chip top out at 20 threads? Setting this too high creates a lot of thread contention. — tadman, May 09 '23 at 00:13
Sorry for the typo, I belive its M2. I printed out `std::cout << num_threads << std::endl` and it printed 64 — , May 09 '23 at 00:15
Parsing a file with 300K lines in C++ with a decent **CSV library** should not even take measurable time. You don't need threads here, you need a better CSV parsing strategy. — tadman, May 09 '23 at 00:15
If you're looking for performance, do pay close attention to how you're passing arguments. You should be using `const` references whenever possible to avoid potentially expensive copies. Also, are you absolutely sure you're using an optimized build? I do not see `-O3` in your build arguments, so this could be an absolutely brutal debug build. Remember, debug builds are like *having the parking brake firmly applied*. — tadman, May 09 '23 at 00:18
But doesn't `hardware_concurency()` return the number of hardware threads that can be executed concurrently on the current system? — , May 09 '23 at 00:19
A) Use a CSV library with excellent performance characteristics. B) **`-O3`**. C) Get rid of threads. The IO performance of your drive should be more than adequate, an M2 should have >1GB/s, potentially 4GB/s. — tadman, May 09 '23 at 00:19
It's supposed to but it might be way off base on the M2 hardware. It likely has a lot of baked in assumptions about HyperThreading that simply does not apply on ARM. I'm not sure why it would give such a weird number. For what it's worth it does give the correct number on my M1 machine. — tadman, May 09 '23 at 00:20
I'd also like to point out that your `DataEntry` needs a constructor that accepts those fields, as you could be creating objects that you immediately throw in the trash, wasting time. Even better, use `emplace_back` into some kind of container. — tadman, May 09 '23 at 00:24
Using a library like boost [link](https://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c) is tough since all my work is done on the school's server where not all libraries are preinstalled. If I run `sudo` I am getting kicked off since its considered as a hacking attempt — , May 09 '23 at 00:25
The `stringstream` and `getline` trick to split up a line is used because it's easy, not because it's particularly fast. That said, pull out a profiler and take a look at where the program is really wasting time, then attack that. Picking and shooting-at targets because you think they're slow has a surprisingly bad success rate. — user4581301, May 09 '23 at 00:28
Note also it looks like you're building a debug executable. They're great for debugging, but they can be whole orders of magnitude slower. Profiling debug code is not a good idea; first let `-O2` or `-O3` take a crack at it. — user4581301, May 09 '23 at 00:30
You can add libraries to your project without needing to root install them, though. It's all about the build options. — tadman, May 09 '23 at 01:22
With many tasks there's a point where adding more threads provides diminishing returns or even takes longer. You might try with fewer threads just to see if there's a difference. Do that after you turn optimization on though. — Retired Ninja, May 09 '23 at 01:24
I just Googled and the Apple M2 Pro is supposed to have [12C / 12T](https://www.notebookcheck.net/Apple-M2-Pro-Processor-Benchmarks-and-Specs.682450.0.html) and not 64T. I am not sure why you get 64 for `std::cout << num_threads << std::endl` — drescherjm, May 09 '23 at 01:40
You should definitely pay attention to every bit of advice offered so far. There are lots of tiny inefficiencies in your program which can be addressed, but should be profiled to test if they're even relevant. Use references, avoid copies, move I/O-bound tasks into a service thread, etc... Also be aware that things like parsing integers and dates can also be inefficient. Sometimes your requirements are simpler and you can [hand-roll](https://stackoverflow.com/q/16826422/1553090) something less general-purpose that is faster. Consider OS-level I/O interfaces (e.g. unbuffered data streams). — paddy, May 09 '23 at 02:34

Marcus Müller · Answer 1 · 2023-05-09T00:16:34.447

3

Your processing in process_chunk does look like it is bound by the speed of reading files. Not by the amount of CPU time you give it.

So, CPU parallelism simply doesn't address what's limiting your speed.

Aside from faster storage devices and maybe slightly smarter parsing, there's nothing you can do to change that.

edited May 09 '23 at 00:16

answered May 09 '23 at 00:15

Marcus Müller

34,677
4
53
94

So what you are saying is that I can't really do anything in order to speed it up? – May 09 '23 at 00:16
Is there a faster way to read files in C++? – May 09 '23 at 00:16
2

I really don't like repeating myself. I answered that question in the last paragraph of my answer. – Marcus Müller May 09 '23 at 00:17
You can do your best to keep the data flowing, such as having a thread that reads from the file into buffers. Use multiple buffers to adjust the speed / efficiency. The more data that can be read per transaction means a more efficient read. – Thomas Matthews May 09 '23 at 00:44

Why is my program still slow after using threading?

1 Answers1