There are two cases that need to be distinguished. Do you need to know the cache sizes at compile time or at runtime?
Determining the cache-size at compile-time
For some applications, you know the exact architecture that your code will run on, for example, if you can compile the code directly on the host machine. In that case, simplify looking up the size and hard-coding it is an option (could be automated in the build system). On most machines today, the L1 cache line should be 64 bytes.
If you want to avoid that complexity or if you need to support compilation on unknown architectures, you can use the C++17 feature std::hardware_constructive_interference_size as a good fallback. It will provide a compile-time estimation for the cache line, but be aware of its limitations. Note that the compiler cannot guess perfectly when it creates the binary, as the size of the cache-line is, in general, architecture dependent.
Determining the cache-size at runtime
At runtime, you have the advantage that you know the exact machine, but you will need platform specific code to read the information from the OS. A good starting point is the code snippet from this answer, which supports the major platforms (Windows, Linux, MacOS). In a similar fashion, you can also read the L2 cache size at runtime.
I would advise against trying to guess the cache line by running benchmarks at startup and measuring which one performed best. It might well work, but it is also error-prone if the CPU is used by other processes.
Combining both approaches
If you have to ship one binary and the machines that it will later run on features a range of different architectures with varying cache sizes, you could create specialized code parts for each cache size, and then dynamically (at application startup) choose the best fitting one.