i am measuring frequency of observations for a sub-category against another sub-category and the ranking for both measures on X and Y are shared. The factor levels range, though they are the same, i.e. "1", "2", "3" - representing "high", "medium", "low", but not all the sub-categories have these values since its based on survey responses (so not all members ranked this sub-category as high or low). I included example data but essentially I want to first plot success against education values for each sub-category and then plot these for each category (I expect I use the mean and then use CI and SD to depict variation using boxplots or other graph). I am not sure how to code it but my example code for plots by sub-category run into error since X and Y lengths differ, i.e. the level aren't represented in each set.
code:
#dataframe
variable <- c("success", "education", "success", "education","success", "education","success", "education")
value <- c("high", "medium", "medium", "low", "high", "high", "low", "low")
rank <- c( "1", "2", "2", "3", "1", "1", "3", "3")
subcategory <- c("USA", "USA", "CANADA", "CANADA", "CHINA", "CHINA", "FINLAND", "FINLAND")
category <- c("NorthAmerica", "NorthAmerica", "NorthAmerica", "NorthAmerica", "Asia", "Asia", "Europe", "Europe")
df <- data.frame(variable, value, rank, subcategory, category)
print (df)
#to bin counts by frequency
count_success <- as.data.frame(table(df[df$variable== "success" &
df$subcategory== "USA", "rank"] ))
count_education <- as.data.frame(table(df[df$variable== "education" &
df$subcategory== "USA", "rank"] ))
#to plot
usaplot<-plot(x = count_success $Freq, xlab = "Level of perceived success",
y = count_education $Freq, ylab = "Level of education")
Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ