I have the following dataset, and I found two ways to demean it.
library(plm)
library(dplyr)
data("EmplUK", package="plm")
EmplUK <- EmplUK %>%
group_by(firm, year) %>%
mutate(Vote = sample(c(0,1),1) ,
Vote_won = ifelse(Vote==1, sample(c(0,1),1),0))
# EDIT:
EmplUK <- pdata.frame(EmplUK , index=c("firm", "year"), drop.index = FALSE)
# A tibble: 1,031 x 9
# Groups: firm, year [1,031]
firm year sector emp wage capital output Vote Vote_won
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1977 7 5.04 13.2 0.589 95.7 1 0
2 1 1978 7 5.60 12.3 0.632 97.4 0 0
3 1 1979 7 5.01 12.8 0.677 99.6 1 1
4 1 1980 7 4.72 13.8 0.617 101. 1 1
5 1 1981 7 4.09 14.3 0.508 99.6 0 0
6 1 1982 7 3.17 14.9 0.423 98.6 0 0
7 1 1983 7 2.94 13.8 0.392 100. 0 0
8 2 1977 7 71.3 14.8 16.9 95.7 1 0
9 2 1978 7 70.6 14.1 17.2 97.4 1 1
10 2 1979 7 70.9 15.0 17.5 99.6 1 1
This one hear (answer by DaveArmstrong): Visualise the relation between two variables in panel data:
demeaned_data <- EmplUK %>%
group_by(firm) %>%
mutate(across(c(output, wage), function(x)x-mean(x)))
And this one hear: Demean R data frame
library(plyr)
demean <- colwise(function(x) if(is.numeric(x)) x - mean(x) else x)
demeaned_data.2 <- ddply(EmplUK, .(firm), demean)
Looking at the histogram however, the results are very different, Does one show the difference and the other the mean minus the difference or something? Is that the same?:
hist(demeaned_data$wage, 100)
hist(demeaned_data.2$wage, 100)