-2

I have a dataset consisting of 339 possible independent variables and 7700 observations. I used the Amelia package (I'm programming in R) in order to visualize the missing values of my data and this is what I got.

Missing values vs observed values graph

I wish to choose for my regressions the variables that I have signaled with the brown rectangle. But since I have 339 variables the labels on the x axis are unreadable and I don't know which are those variables. I have already tried x.cex=0.1 and x.cex=0.01 but the labels become too small to read. My question is how can I identify the variables in the brown rectangle?

r2evans
  • 141,215
  • 6
  • 77
  • 149
Werther
  • 133
  • 7
  • 4
    If they are unreadable to you -- where you have the ability to zoom out and *actually look at the data* (to see what margin labels are being used), how do you expect us (with this pixelated blur) to infer anything? Please make this question *reproducible*. This includes sample code (including listing non-base R packages), sample data (e.g., `dput(head(x))`), and expected output. Refs: https://stackoverflow.com/questions/5963269, https://stackoverflow.com/help/mcve, and https://stackoverflow.com/tags/r/info. – r2evans Nov 17 '18 at 23:59

1 Answers1

1

here's a way to do it:

data <- as.data.frame(setNames(replicate(10,sample(c(1,NA),1000,replace=TRUE),simplify = FALSE),LETTERS[1:10]))
head(data)
#    A  B  C  D  E  F  G  H  I  J
# 1 NA NA NA  1  1 NA NA  1  1  1
# 2  1  1 NA  1  1 NA  1 NA  1 NA
# 3  1  1 NA  1  1 NA  1  1 NA  1
# 4 NA  1  1 NA  1  1  1  1 NA NA
# 5  1 NA NA NA NA  1 NA  1 NA NA
# 6  1  1  1 NA NA  1 NA NA  1  1

x <- stack(sapply(data,function(x) sum(is.na(x))))
head(x[order(x$values),])
#    values ind
# 7     476   G
# 3     478   C
# 8     481   H
# 10    489   J
# 4     499   D
# 2     500   B

with tidyverse that would be :

library(tidyverse)
data %>%
  gather %>%
  group_by(key) %>%
  summarize(NAs = sum(is.na(value))) %>%
  arrange(NAs) %>%
  head
# # A tibble: 6 x 2
#   key     NAs
#   <chr> <int>
# 1 G       476
# 2 C       478
# 3 H       481
# 4 J       489
# 5 D       499
# 6 B       500
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167