Please, see 'Addendum 3'
I am trying to perform an ANOVA test in R to see whether there are differences among the voters of the 5 main political parties in the Spanish 2019 General Elections according to the variable 'age' (P20_range
stands for different age intervals in my code).
My code is, as follows:
CIS_data_5 <- data.frame(
CIS$RECUERDO,
CIS$P20
)
CIS_data_5$CIS.RECUERDO <- sub("\\(NO LEER\\) ", "", CIS_data_5$CIS.RECUERDO)
RecuerdoDeVoto1 <- subset(CIS_data_5, CIS.RECUERDO %in% c("Unidas Podemos"))
RecuerdoDeVoto2 <- subset(CIS_data_5, CIS.RECUERDO %in% c("PSOE"))
RecuerdoDeVoto3 <- subset(CIS_data_5, CIS.RECUERDO %in% c("Ciudadanos"))
RecuerdoDeVoto4 <- subset(CIS_data_5, CIS.RECUERDO %in% c("PP"))
RecuerdoDeVoto5 <- subset(CIS_data_5, CIS.RECUERDO %in% c("VOX"))
P20 <- as.integer(as.character(CIS_data_5$CIS.P20))
P20labs <- c("16-29", "30-44", "45-64", ">65", "N.C.")
cut_points <- c(16, 30, 45, 65, Inf)
i <- findInterval(P20, cut_points)
P20_fac <- P20labs[i]
P20_fac[is.na(P20)] <- P20labs[length(P20labs)]
P20_fac <- factor(P20_fac, levels = P20labs)
CIS_data_5$CIS.P20 <- P20
CIS_data_5$P20_range <- P20_fac
P20_range <-as.vector(CIS_data_5$P20_range)
# Computing the Analysis of Variance
CIS_data_6 <- list(RecuerdoDeVoto1=RecuerdoDeVoto1,RecuerdoDeVoto2=RecuerdoDeVoto2,RecuerdoDeVoto3=RecuerdoDeVoto3, RecuerdoDeVoto4=RecuerdoDeVoto4,RecuerdoDeVoto5=RecuerdoDeVoto5)
data.frame(RecuerdoDeVoto=unlist(CIS_data_6),
P20_range=factor(rep(names(CIS_data_6),sapply(CIS_data_6,length))))
res.aov <- aov(RecuerdoDeVoto~P20_range, data = CIS_data_6)
# Summary of the Analysis
summary(res.aov)
However, I am not sure what I am doing wrong, since I looked up this question Attempting to create anova table with unequal sizes R and I have reproduced the code exactly (with, of course, the necessary modifications, so it fits my data), but I keep getting the following error:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 191, 623, 115, 387, 114
which of course corresponds to the differing amount of voters for each of the 5 main Spanish political parties (Unidas Podemos, PSOE, Ciudadanos, PP, and VOX).
I am not sure how I can override this problem within my code.
Thus, any help would be of enormous appreciation!
Many thanks in advance!
Addendum 1
It has been suggested to me that maybe I should try to perform a Pearson Chi-Squared Test for this particular problem that I am trying to analyse; but I am really not sure of whether I should root for an ANOVA or for a Pearson Chi-Squared Test in this case. Again, any comment on this is much welcome!
Addendum 2
I have tried to perform a Pearson Chi-Squared Test by running the following code:
CIS_data_5 <- data.frame(
CIS$RECUERDO,
CIS$P20
)
CIS_data_5$CIS.RECUERDO <- sub("\\(NO LEER\\) ", "", CIS_data_5$CIS.RECUERDO)
RecuerdoDeVoto1 <- subset(CIS_data_5, CIS.RECUERDO %in% c("Unidas Podemos"))
RecuerdoDeVoto2 <- subset(CIS_data_5, CIS.RECUERDO %in% c("PSOE"))
RecuerdoDeVoto3 <- subset(CIS_data_5, CIS.RECUERDO %in% c("Ciudadanos"))
RecuerdoDeVoto4 <- subset(CIS_data_5, CIS.RECUERDO %in% c("PP"))
RecuerdoDeVoto5 <- subset(CIS_data_5, CIS.RECUERDO %in% c("VOX"))
P20 <- as.integer(as.character(CIS_data_5$CIS.P20))
P20labs <- c("16-29", "30-44", "45-64", ">65", "N.C.")
cut_points <- c(16, 30, 45, 65, Inf)
i <- findInterval(P20, cut_points)
P20_fac <- P20labs[i]
P20_fac[is.na(P20)] <- P20labs[length(P20labs)]
P20_fac <- factor(P20_fac, levels = P20labs)
CIS_data_5$CIS.P20 <- P20
CIS_data_5$P20_range <- P20_fac
P20_range <-as.vector(CIS_data_5$P20_range)
RecuerdoDeVoto <- c(RecuerdoDeVoto1, RecuerdoDeVoto2, RecuerdoDeVoto3, RecuerdoDeVoto4, RecuerdoDeVoto5)
IntervalosDeEdad <- rep(P20_range, length(RecuerdoDeVoto1), length(RecuerdoDeVoto2), length(RecuerdoDeVoto3), length(RecuerdoDeVoto4), length(RecuerdoDeVoto5))
chisq.test(RecuerdoDeVoto, IntervalosDeEdad)
And I get the following error:
Error in chisq.test(RecuerdoDeVoto, IntervalosDeEdad) :
'x' and 'y' must have the same length
Addendum 3
After much research, I've found that the best way to go is to perform a Welch's T-Test, since I am dealing with 2 samples of different size, hence different variances. However, I am not sure on how to perform it in R.
Any help is much welcome!