0

I am trying to create a scatterplot matrix using package GGally and ggpairs. In my dataset tol, I have several demographic variables that are categorical, and several that are continuous. I created a data frame with the variables I wanted and tried to omit NA values because I keep getting this error:

Error in cor.test.default(x, y, method = method, use = use) : not enough finite observations"

When I don't include the aesthetic mapping, the scatterplot works just fine. Even when I mess with my csv file to make sure there are no empty cells, I still get this error.

Here is the code:

cs <- tol[c("location","comp_sat_avg","burnout_avg","sec_stress_avg","burnout_ee_avg","burnout_dp_avg","burnout_pa_avg","obs_avg","desc_avg","aware_avg","nonjudg_avg","nonreac_avg","wkplre_wc_avg","Efficacy_avg","Lotr_avg","hsecontrol_avg","hsemsupport_avg","hsepsupport_avg","hserole_avg","hsedemands_avg")]
csdata <- na.omit(cs)

ggpairs(csdata,lower=list(continuous="smooth"),mapping=ggplot2::aes(color= location)) +
  theme_bw()

I have three other categorical variables I need to group by separately so any help is extremely appreciated.

Per stefan's comment here is a sample of my dataset:

tol  <- structure(list(location = c("Mukono Health Center IV", "Mukono Health Center IV", 
"Goma Health Center III", "Goma Health Center III", "Goma Health Center III", 
"Kawolo General Hospital", "Kawolo General Hospital", "Mukono Health Center IV", 
"Mukono Health Center IV", "Lwanyonyi VHT", "Mukono Health Center IV", 
"Goma Health Center III", "Mukono Health Center IV", "Mukono Health Center IV", 
"Goma Health Center III", "Mukono Health Center IV", "Mukono Health Center IV", 
"Mukono Health Center IV", "Mukono Health Center IV", "Lwanyonyi VHT"
), comp_sat_avg = c(4.6, 4.9, 4.4, 4.2, 3.7, 4.2, 3, 4.3, 3.8, 
4.4, 2.8, 3.9, 4.7, 4.4, 3.22, 4.6, 1.8, 4.67, 3, 4.8), burnout_avg = c(2.2, 
3.2, 2.1, 2.7, 3.4, 2.1, 3.11, 2.4, 2.6, 2.5, 2.89, 2, 1.8, 1.8, 
2.78, 2.6, 3.5, 2.7, 2.56, 2.1), sec_stress_avg = c(2.6, 1.4, 
2.44, 3.1, 3.5, 2.8, 3.1, 2.4, 3.1, 3.33, 2.56, 1.8, 2.8, 1.9, 
3.1, 2.8, 1.5, 3.8, 3.9, 2.6), burnout_ee_avg = c(2.11, 2.33, 
2.78, 2.67, 4.67, 1.22, 1, 3.33, 1.78, 4.33, 3.33, 1.78, 2.78, 
1.11, 1.67, 2.89, 5.89, 1.78, 3, 0.78), burnout_dp_avg = c(1.6, 
0.4, 1.2, 2.4, 1.8, 0.75, 1.2, 2.8, 0.6, 2.4, 4.2, 2.4, 1.2, 
0.6, 3.8, 3.2, 5.6, 1, 1.6, 0.4), burnout_pa_avg = c(5.13, 5.75, 
4.75, 2.88, 5.25, 4.67, 5.75, 5, 5.5, 5.25, 4.88, 4.5, 3.75, 
4.13, 3.13, 4, 4, 3, 4.88, 5.88), obs_avg = c(3.63, 3.25, 2, 
4.38, 2.88, 4, 3.75, 2.38, 2.13, 2.75, 4.63, 3.88, 3, 2.14, 3.83, 
3.5, 2.25, 2.63, 4.13, 3.88), desc_avg = c(3, 3.38, 4.5, 3.88, 
3.38, 3.13, 3.63, 2.63, 3.75, 4.25, 3.5, 4.38, 2.57, 3.63, 3.25, 
3.63, 3.13, 4.13, 4.25, 3.38), aware_avg = c(2.5, 4.25, 4.63, 
4.25, 4.13, 3.5, 4.13, 3.25, 3.25, 4.75, 4.13, 4.75, 3.5, 3.88, 
2.13, 4.13, 3.5, 4.13, 3.57, 3.25), nonjudg_avg = c(1.88, 3.63, 
4.38, 1.88, 2.63, 3.25, 3, 3, 3.25, 4, 2, 3, 3, 4.88, 1.86, 2.88, 
3.25, 2.5, 2.38, 1.63), nonreac_avg = c(3.71, 3.57, 2.43, 4.29, 
3, 3.43, 3.86, 3.86, 2.86, 4.29, 3.86, 3, 3, 3.14, 4.43, 3.43, 
2.8, 3.71, 3.57, 3.43), wkplre_wc_avg = c(5.07, 6.13, 5.8, 5.27, 
4.33, 6.2, 4.07, 7, 6.27, 2.29, 5.14, 4.4, 4.73, 5.47, 5.07, 
4.93, 3.07, 5.6, 5.73, 4.8), Efficacy_avg = c(4, 1.4, 3.6, 3.1, 
3.1, 2.9, 3.6, 2, 2.5, 3.3, 3.7, 3.6, 1.9, 3.7, 3.5, 3.6, 3.2, 
3.6, 3.5, 3.9), Lotr_avg = c(2.17, 2.33, 3.6, 0.5, 2.67, 1.67, 
3.2, 2.17, 2.5, 3.67, 2.33, 3.67, 1.17, 1.83, 2, 2.67, 1.83, 
2.67, 2.83, 3.5), hsecontrol_avg = c(3.67, 4.5, 3.5, 3.5, 3.17, 
3.83, 4.5, 4.33, 3.83, 3.83, 3.67, 4.67, 4.5, 3.67, 3.83, 3.17, 
3, 4.17, 3.83, 3.17), hsemsupport_avg = c(3.6, 4, 3.2, 3.6, 3.2, 
4.2, 3.6, 4, 3.8, 3.6, 3, 4.2, 3.4, 4.2, 3.8, 3.2, 2.4, 4, 4, 
3.8), hsepsupport_avg = c(3.25, 4, 3.75, 3.5, 3, 4.75, 4.25, 
4.75, 3.75, 3.5, 4.67, 4.25, 3.75, 4, 4, 3.25, 1.5, 4, 4, 4), 
    hserole_avg = c(4.8, 5, 4.4, 4.2, 5, 4, 4, 4.2, 4, 4.6, 4.6, 
    4.8, 4.2, 4.2, 3.2, 4.4, 2.8, 4, 4.2, 5), hsedemands_avg = c(2, 
    3.29, 3.29, 4, 1.86, 3.57, 3.29, 1.71, 3.14, 1.71, 3.71, 
    3.71, 3.43, 3.86, 1.86, 2.71, 4, 3.29, 3.57, 2.57)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"), na.action = structure(c(`1` = 1L, 
`5` = 5L, `11` = 11L, `15` = 15L, `19` = 19L, `24` = 24L, `27` = 27L, 
`30` = 30L, `46` = 46L, `47` = 47L), class = "omit"))
Abdessabour Mtk
  • 3,895
  • 2
  • 14
  • 21
Zariah
  • 1
  • 2
  • Welcome to SO! To help us to help you could you please make your issue reproducible by sharing a sample of your **data**? Simply type `dput(head(NAME_OF_DATASET, 20))` (which will give the first 20 rows) into the console and copy & paste the output starting with `structure(....` into your post. See also [how to make a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – stefan Oct 22 '20 at 20:28

1 Answers1

1

You need to take two steps to make this work. There are two locations that only have two observations, which will not work with cor.test.default. Subset your data to remove those observations:

csdata <- 
  csdata %>%
  filter(
    location != "Kawolo General Hospital"
  , location != "Lwanyonyi VHT"
  )

However, now your dataset will retain those factor levels but with 0 observations for each. Convert variable locations to factor using:

csdata$location <- factor(csdata$location)

Now your ggpairs with aesthetics mapping will run no problem:

ggpairs(csdata,lower=list(continuous="smooth"),mapping=ggplot2::aes(color= location)) +
  theme_bw()
Drbeene
  • 11
  • 1