0

I want to inline the function MyClass:at(), but performance isn't as I expect.

MyClass.cpp

#include <algorithm>
#include <chrono>
#include <iostream>
#include <vector>
#include <string>

// Making this a lot shorter than in my actual program
std::vector<std::vector<int>> arrarr = 
{
    { 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52,  1, 89, 19, 67, 48},
    {24, 47, 32, 60, 99,  3, 45,  2, 44, 75, 33, 53, 78, 36, 84, 20, 35, 17, 12, 50},
    {32, 98, 81, 28, 64, 23, 67, 10, 26, 38, 40, 67, 59, 54, 70, 66, 18, 38, 64, 70},
    {67, 26, 20, 68,  2, 62, 12, 20, 95, 63, 94, 39, 63,  8, 40, 91, 66, 49, 94, 21},
    {24, 55, 58,  5, 66, 73, 99, 26, 97, 17, 78, 78, 96, 83, 14, 88, 34, 89, 63, 72},
    {21, 36, 23,  9, 75,  0, 76, 44, 20, 45, 35, 14,  0, 61, 33, 97, 34, 31, 33, 95},
    {78, 17, 53, 28, 22, 75, 31, 67, 15, 94,  3, 80,  4, 62, 16, 14,  9, 53, 56, 92},
    {16, 39,  5, 42, 96, 35, 31, 47, 55, 58, 88, 24,  0, 17, 54, 24, 36, 29, 85, 57},
    {86, 56,  0, 48, 35, 71, 89,  7,  5, 44, 44, 37, 44, 60, 21, 58, 51, 54, 17, 58},
    {19, 80, 81, 68,  5, 94, 47, 69, 28, 73, 92, 13, 86, 52, 17, 77,  4, 89, 55, 40},
    { 4, 52,  8, 83, 97, 35, 99, 16,  7, 97, 57, 32, 16, 26, 26, 79, 33, 27, 98, 66},
    {88, 36, 68, 87, 57, 62, 20, 72,  3, 46, 33, 67, 46, 55, 12, 32, 63, 93, 53, 69},
    { 4, 42, 16, 73, 38, 25, 39, 11, 24, 94, 72, 18,  8, 46, 29, 32, 40, 62, 76, 36},
    {20, 69, 36, 41, 72, 30, 23, 88, 34, 62, 99, 69, 82, 67, 59, 85, 74,  4, 36, 16},
    {20, 73, 35, 29, 78, 31, 90,  1, 74, 31, 49, 71, 48, 86, 81, 16, 23, 57,  5, 54},
    { 1, 70, 54, 71, 83, 51, 54, 69, 16, 92, 33, 48, 61, 43, 52,  1, 89, 19, 67, 48},
};

class MyClass
{
public:
    MyClass(std::vector<std::vector<int>> arr) : arr(arr)
    {
        rows = arr.size();
        cols = arr.at(0).size();
    }
    inline auto at(int row, int col) const { return arr[row][col]; }
    void arithmetic(int n) const;
private:
    std::vector<std::vector<int>> arr;
    int rows;
    int cols;
};

MyClass.cpp:

void MyClass::arithmetic(int n) const
{
    using std::chrono::high_resolution_clock;
    using std::chrono::duration_cast;
    using std::chrono::duration;
    using std::chrono::milliseconds;

    auto t1 = high_resolution_clock::now();
    int highest_product = 0;
    for (auto y = 0; y < rows; ++y)
    {
        for (auto x = 0; x < cols; ++x)
        {
            // Horizontal product
            if (x + n < cols)
            {
                auto product = 1;
                for (auto i = 0; i < n; ++i)
                {
                    product *= at(y, x + i);
                }
                highest_product = std::max(highest_product, product);
            }
        }
    }
    auto t2 = high_resolution_clock::now();
    duration<double, std::milli> ms_double = t2 - t1;
    std::cout << ms_double.count() << "ms\n";

    return highestProduct;
};

Now what I want know is why do I get better performance when I replace product *= at(y, x + i); with product *= arr[y][x+i];? When I test it with the first case, the timing on my large array takes roughly 6.7ms, and the second case takes 5.3ms. I thought when I inlined the function, it should be the same implementation as the second case.

B. Lee
  • 191
  • 6
  • 4
    The compiler is in no way inclined to actually inline your code. Did you specify certain optimization levels? – πάντα ῥεῖ Oct 09 '22 at 10:39
  • 4
    Try to use -O3 flag – TheRyuTeam Oct 09 '22 at 10:41
  • Ok, now it performs a lot better. I tried with -O2 flag and it went down to `0.5ms`. But in your opinion is my thought process correct in that inlining should perform the same as when I directly access the array? – B. Lee Oct 09 '22 at 10:44
  • @B.Lee "Inline function is a function that is expanded in line when it is called. When the inline function is called whole code of the inline function gets inserted or substituted at the point of inline function call. This substitution is performed by the C++ compiler at compile time." – TheRyuTeam Oct 09 '22 at 11:15
  • A function implemented in the class declaration (like yours) is already implicitly `inline`, specifying the keyword explicitly is redundant. Furthermore, `inline` doesn't really have anything to do with whether a compiler will actually inline a function (at best it's a hint in that direction, often it's just ignored). For the actual meaning of `inline` see https://en.cppreference.com/w/cpp/language/inline – Jesper Juhl Oct 09 '22 at 12:33

1 Answers1

1

Member function directly defined in the class definition (typically in header files) are implicitly inlined so using inline is useless in this case. inline do not guarantee the function is inlined. It is just an hint for the compiler. The keyword is also an important during the link to avoid the multiple-definition issue. Function that are not make inline can still be inlined if the compiler can see the code of the target function (ie. it is in the same translation unit or link time optimization are applied). For more information about this, please read Why are class member functions inlined?

Note that the inlining is typically performed in the optimization step of compilers (eg. -O1//O1). Thus without optimizations, most compilers will not inline the function.


Using std::vector<std::vector<int>> is not efficient since it is not a contiguous data structure and it require 2 indirection to access an item. Two sub-vectors next to each other can be stored far away in memory likely causing more cache misses (and/or thrashing due to the alignment). Please consider using one big flatten array and access items using y*cols+x where cols is the size of the sub-vectors (20 here). Alternatively a int[16][20] data type should do the job well if the size if fixed at compile-time.

MyClass(std::vector<std::vector<int>> arr) cause the input parameter to be copied (and so all the sub-vectors). Please consider using a const std::vector<std::vector<int>>& type.

While at is convenient for checking bounds at runtime, this feature can strongly decrease performance. Consider using the operator [] if you do not need that. You can use assertions combined with flatten arrays so to get a fast code in release and a safe code in debug (you can enable/disable them by defining the NDEBUG macro).

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59