First of all, please note that Kubernetes CPU is an absolute unit:
Limits and requests for CPU resources are measured in cpu units. One
cpu, in Kubernetes, is equivalent to 1 vCPU/Core for cloud providers
and 1 hyperthread on bare-metal Intel processors.
CPU is always requested as an absolute quantity, never as a relative
quantity; 0.1 is the same amount of CPU on a single-core, dual-core,
or 48-core machine
In other words, a CPU value of 1 corresponds to using a single core continiously over time.
The value of resources.requests.cpu
is used during scheduling and ensures that the sum of all requests on a single node is less than the node capacity.
When you create a Pod, the Kubernetes scheduler selects a node for the
Pod to run on. Each node has a maximum capacity for each of the
resource types: the amount of CPU and memory it can provide for Pods.
The scheduler ensures that, for each resource type, the sum of the
resource requests of the scheduled Containers is less than the
capacity of the node. Note that although actual memory or CPU resource
usage on nodes is very low, the scheduler still refuses to place a Pod
on a node if the capacity check fails. This protects against a
resource shortage on a node when resource usage later increases, for
example, during a daily peak in request rate.
The value of resources.limits.cpu
is used to determine how much CPU can be used given that it is available, see How pods with limist are run
The spec.containers[].resources.limits.cpu is converted to its
millicore value and multiplied by 100. The resulting value is the
total amount of CPU time in microseconds that a container can use
every 100ms. A container cannot use more than its share of CPU time
during this interval.
In other words, the requests is what the container is guaranteed in terms of CPU time, and the limit is what it can use given that it is not used by someone else.
The concept of multithreading does not change the above, the requests and limits apply to the container as a whole, regardless of how many threads run inside. The Linux scheduler do scheduling decisions based on waiting time, and with containers Cgroups is used to limit the CPU bandwidth. Please see this answer for a detailed walkthrough: https://stackoverflow.com/a/61856689/7146596
To finally answer the question
Your on premises VM has 4 cores, operating on 2,5 GHz, and if we assume that the CPU capacity is a function of clock speed and number of cores, you currently have 10 GHz "available"
The CPU's used in standard_D16ds_v4 has a base speed of 2.5GHz and can run up to 3.4GHz or shorter periods according to the documentation
The D v4 and Dd v4 virtual machines are based on a custom Intel® Xeon®
Platinum 8272CL processor, which runs at a base speed of 2.5Ghz and
can achieve up to 3.4Ghz all core turbo frequency.
Based on this specifying 4 cores should be enough ti give you the same capacity as onpremises.
However number of cores and clock speed is not everything (caches etc also impacts performance), so to optimize the CPU requests and limits you may have to do some testing and fine tuning.