0

i am measuring frequency of observations for a sub-category against another sub-category and the ranking for both measures on X and Y are shared. The factor levels range, though they are the same, i.e. "1", "2", "3" - representing "high", "medium", "low", but not all the sub-categories have these values since its based on survey responses (so not all members ranked this sub-category as high or low). I included example data but essentially I want to first plot success against education values for each sub-category and then plot these for each category (I expect I use the mean and then use CI and SD to depict variation using boxplots or other graph). I am not sure how to code it but my example code for plots by sub-category run into error since X and Y lengths differ, i.e. the level aren't represented in each set.

 code:

#dataframe 
variable <- c("success", "education", "success", "education","success", "education","success", "education")
value <- c("high", "medium", "medium", "low", "high", "high", "low", "low")
rank <- c( "1", "2", "2", "3", "1", "1", "3", "3")
subcategory <- c("USA", "USA", "CANADA", "CANADA", "CHINA", "CHINA", "FINLAND", "FINLAND")
category <- c("NorthAmerica", "NorthAmerica", "NorthAmerica", "NorthAmerica", "Asia", "Asia", "Europe", "Europe")
df <- data.frame(variable, value, rank, subcategory, category)
print (df)
    
#to bin counts by frequency
count_success <- as.data.frame(table(df[df$variable== "success" &
                 df$subcategory== "USA", "rank"] ))
count_education <- as.data.frame(table(df[df$variable== "education" &
                 df$subcategory== "USA", "rank"] ))
            
#to plot
usaplot<-plot(x = count_success $Freq, xlab = "Level of perceived success",
            y = count_education $Freq, ylab = "Level of education")
    

Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ

  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Please do not post pictures of data. We do not want to have to retype everything to test the code. – MrFlick Apr 13 '22 at 20:12
  • thank you, i modified it with reproducible example of the data – rspatialqs Apr 13 '22 at 22:24

1 Answers1

1

I think you need to reshape your data into wide format. You can then plot education versus success. This is difficult to do if they are labels in a single column rather than individual columns. We can do that with pivot_wider. You can then create a nice plot using ggplot

library(tidyverse)

df %>% 
  mutate(value = factor(value, c("low", "medium", "high"))) %>%
  pivot_wider(names_from = variable, values_from = c("rank", "value")) %>%
  ggplot(aes(value_success, value_education)) +
  geom_point(aes(fill = category), size = 5, shape = 21) +
  geom_text(aes(label = subcategory), nudge_x = 0.05, nudge_y = -0.1,
            hjust = 0) +
  coord_equal() +
  theme_light() +
  theme(text = element_text(size = 16),
        axis.title.x = element_text(margin = margin(20, 20, 20, 20)),
        axis.title.y = element_text(margin = margin(20, 20, 20, 20))) +
  labs(title = "Value of success versus education by country",
       fill = "Region",
       x = "Value of Success", y = "Value of Education")

enter image description here

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • thank you so much; this looks great and works on the example dataset, however for my data it says that values are not uniquely identified. I will work on troubleshooting it – rspatialqs Apr 14 '22 at 02:37