It's important to understand what scale()
is doing to your data. I've pulled an example from https://stackoverflow.com/a/20256272/11167644 to explain:
set.seed(1)
x <- runif(6)
x
#> [1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 0.8983897
(x - mean(x)) / sd(x)
#> [1] -0.8717643 -0.5287394 0.1170895 1.1960620 -1.0771210 1.1644732
scale(x)[1:6]
#> [1] -0.8717643 -0.5287394 0.1170895 1.1960620 -1.0771210 1.1644732
Your data is being scaled and centered around zero - we can further verify this by looking that the summary()
of both the unscaled and scaled data sets:
data("USArrests")
df <- USArrests
summary(df)
#> Murder Assault UrbanPop Rape
#> Min. : 0.800 Min. : 45.0 Min. :32.00 Min. : 7.30
#> 1st Qu.: 4.075 1st Qu.:109.0 1st Qu.:54.50 1st Qu.:15.07
#> Median : 7.250 Median :159.0 Median :66.00 Median :20.10
#> Mean : 7.788 Mean :170.8 Mean :65.54 Mean :21.23
#> 3rd Qu.:11.250 3rd Qu.:249.0 3rd Qu.:77.75 3rd Qu.:26.18
#> Max. :17.400 Max. :337.0 Max. :91.00 Max. :46.00
summary(scale(df))
#> Murder Assault UrbanPop Rape
#> Min. :-1.6044 Min. :-1.5090 Min. :-2.31714 Min. :-1.4874
#> 1st Qu.:-0.8525 1st Qu.:-0.7411 1st Qu.:-0.76271 1st Qu.:-0.6574
#> Median :-0.1235 Median :-0.1411 Median : 0.03178 Median :-0.1209
#> Mean : 0.0000 Mean : 0.0000 Mean : 0.00000 Mean : 0.0000
#> 3rd Qu.: 0.7949 3rd Qu.: 0.9388 3rd Qu.: 0.84354 3rd Qu.: 0.5277
#> Max. : 2.2069 Max. : 1.9948 Max. : 1.75892 Max. : 2.6444
Again, noting the mean of zero - this explains why the data sums to zero.
Finally we can look visually at what the scaled vs. unscaled data looks like with some histograms:
library(tidyverse)
df %>%
select(Murder) %>%
mutate(Scaled_Murder = scale(Murder)) %>%
pivot_longer(everything()) %>%
ggplot(aes(value, fill = name)) +
geom_histogram(alpha = 0.75, position = "identity", bins = 20)

Created on 2021-03-02 by the reprex package (v0.3.0)