1

I am trying to implement a series of box plots which demonstrate the area, radius and concavity of human cells. The variable I am trying to plot is 'characters' split into two subsets 'Malignant' or 'Benign'. I keep receiving the following error messages:

> Error in FUN(X[[i]], ...) : object 'Class_mean' not found

> Error in FUN(X[[i]], ...) : object 'Class_radius' not found

Please see my code:

    ggplot(wisconsin, aes(x= Class, y=Class_mean, fill="pink")) + 
      geom_boxplot(fill= "yellow")+
      ggtitle("radius of benign and malignant stage")
    
    ggplot(wisconsin, aes(x= Class, y=Class_radius))+ 
      geom_boxplot()+
      ggtitle("area of benign and malignant stage")
    
    ggplot(wisconsin, aes(x= Class, y=concavity_mean))+ 
      geom_boxplot()+
      ggtitle("concavity of benign and malignant stage")

Any ideas on how I could figure out the radius, mean and concavity object to Y variable?

All suggestions welcome

Please see head of data:

structure(list(Cl.thickness = c(5L, 5L, 3L, 6L, 4L, 8L, 1L, 2L, 
2L, 4L, 1L, 2L, 5L, 1L, 8L, 7L, 4L, 4L, 10L, 6L), Cell.size = c(1L, 
4L, 1L, 8L, 1L, 10L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 7L, 4L, 
1L, 1L, 7L, 1L), Cell.shape = c(1L, 4L, 1L, 8L, 1L, 10L, 1L, 
2L, 1L, 1L, 1L, 1L, 3L, 1L, 5L, 6L, 1L, 1L, 7L, 1L), Marg.adhesion = c(1L, 
5L, 1L, 1L, 3L, 8L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 10L, 4L, 
1L, 1L, 6L, 1L), Epith.c.size = c(2L, 7L, 2L, 3L, 2L, 7L, 2L, 
2L, 2L, 2L, 1L, 2L, 2L, 2L, 7L, 6L, 2L, 2L, 4L, 2L), Bare.nuclei = c(1L, 
10L, 2L, 4L, 1L, 10L, 10L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 9L, 1L, 
1L, 1L, 10L, 1L), Bl.cromatin = c(3L, 3L, 3L, 3L, 3L, 9L, 3L, 
3L, 1L, 2L, 3L, 2L, 4L, 3L, 5L, 4L, 2L, 3L, 4L, 3L), Normal.nucleoli = c(1L, 
2L, 1L, 7L, 1L, 7L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 1L, 5L, 3L, 1L, 
1L, 1L, 1L), Mitoses = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 
1L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 1L, 2L, 1L), Class = c("benign", 
"benign", "benign", "benign", "benign", "malignant", "benign", 
"benign", "benign", "benign", "benign", "benign", "malignant", 
"benign", "malignant", "malignant", "benign", "benign", "malignant", 
"benign")), row.names = c(NA, 20L), class = "data.frame")
> dput(head(wisconsin, 20))
structure(list(Cl.thickness = c(5L, 5L, 3L, 6L, 4L, 8L, 1L, 2L, 
2L, 4L, 1L, 2L, 5L, 1L, 8L, 7L, 4L, 4L, 10L, 6L), Cell.size = c(1L, 
4L, 1L, 8L, 1L, 10L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 7L, 4L, 
1L, 1L, 7L, 1L), Cell.shape = c(1L, 4L, 1L, 8L, 1L, 10L, 1L, 
2L, 1L, 1L, 1L, 1L, 3L, 1L, 5L, 6L, 1L, 1L, 7L, 1L), Marg.adhesion = c(1L, 
5L, 1L, 1L, 3L, 8L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 10L, 4L, 
1L, 1L, 6L, 1L), Epith.c.size = c(2L, 7L, 2L, 3L, 2L, 7L, 2L, 
2L, 2L, 2L, 1L, 2L, 2L, 2L, 7L, 6L, 2L, 2L, 4L, 2L), Bare.nuclei = c(1L, 
10L, 2L, 4L, 1L, 10L, 10L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 9L, 1L, 
1L, 1L, 10L, 1L), Bl.cromatin = c(3L, 3L, 3L, 3L, 3L, 9L, 3L, 
3L, 1L, 2L, 3L, 2L, 4L, 3L, 5L, 4L, 2L, 3L, 4L, 3L), Normal.nucleoli = c(1L, 
2L, 1L, 7L, 1L, 7L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 1L, 5L, 3L, 1L, 
1L, 1L, 1L), Mitoses = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 5L, 
1L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 1L, 2L, 1L), Class = c("benign", 
"benign", "benign", "benign", "benign", "malignant", "benign", 
"benign", "benign", "benign", "benign", "benign", "malignant", 
"benign", "malignant", "malignant", "benign", "benign", "malignant", 
"benign")), row.names = c(NA, 20L), class = "data.frame")
Phil
  • 7,287
  • 3
  • 36
  • 66
jay_2022
  • 11
  • 3
  • 1
    Is `wisconsin` a data.frame that has the `Class_radius` column? – Ric Oct 19 '22 at 19:41
  • 1
    Can you post sample data? Please edit the question with the output of `dput(wisconsin)`. Or, if it is too big with the output of `dput(head(wisconsin, 20))`. – Rui Barradas Oct 19 '22 at 19:46
  • @RicVillalba data file is big so apologies for unclear data above. However there is no class with radius, mean or concavity. The variables relate to characteristics of cells which are all integers. – jay_2022 Oct 19 '22 at 19:54
  • @RuiBarradas I have attempted to add data above. Big data so apologies for unclear presentation – jay_2022 Oct 19 '22 at 19:55
  • 1
    Please post in `dput` format, like this it's much more difficult to us to copy, paste and recreate the data.frame. And post how you have computed the mean, radius and concavity. – Rui Barradas Oct 19 '22 at 20:03
  • @RuiBarradas I have now posted in requested format. Kind regards – jay_2022 Oct 19 '22 at 20:09
  • It is unclear how your input data relates to the plot code. In your example data, the columns are `"Cl.thickness", "Cell.size", "Cell.shape", "Marg.adhesion", "Epith.c.size", "Bare.nuclei", "Bl.cromatin", "Normal.nucleoli", "Mitoses", "Class"`. How do you want to map that to mean, radius, concavity? – Jon Spring Oct 19 '22 at 20:34
  • HI @JonSpring, what I am trying to achieve is visualisations to distinguish clearly the differences in 'malignant and benign' cell characteristics. I was attempting to do it through radius, mean and concavity as that is what i saw online however I am open to alternative ways to distinguish characteristics. – jay_2022 Oct 19 '22 at 21:00

1 Answers1

1

If you want to plot each character per class, then the code below might solve the problem.

This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format. I will use package tidyr, function pivot_longer.

library(ggplot2)

wisconsin |>
  tidyr::pivot_longer(-Class, names_to = "characters") |>
  ggplot(aes(x = Class, y = value)) + 
  geom_boxplot(fill = "lightblue") +
  facet_wrap(~ characters) +
  theme_bw()

Created on 2022-10-19 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thank you for providing a solution. For my own learning, could you explain what the difference is between long data and short data. Does this refer to data transformation? kind regards – jay_2022 Oct 19 '22 at 21:30
  • 1
    Long, aka "tidy" data: https://vita.had.co.nz/papers/tidy-data.pdf and https://r4ds.had.co.nz/tidy-data.html – Jon Spring Oct 19 '22 at 21:34
  • @jay_2022 The links Jon provided are excellent. In this case, the new, long column `characters` is the old, wide format column names and this allows for the faceting. The new variable defines the groups used to split the plot into 9 plots. – Rui Barradas Oct 19 '22 at 22:50