1

I am trying to remove all NA values from two columns in a matrix and make sure that neither column has a value that the other doesn't. code:

data <- dget(file)

dependent <- data[,"chroma"]
independent <- data[,"mass..Pantheria."]

names(independent) <- names(dependent) <- rownames(data)

for (name in rownames(data)) {

    if(is.na(dependent[name])) {
      independent$name <- NULL
      dependent$name <- NULL
  }

    if(is.na(independent[name])) {
      independent$name <- NULL
      dependent$name <- NULL
  }
}
print(dput(independent))
print(dput(dependent))

I am brand new to R and am trying to perform this task with a for loop. However, when I delete a section by assigning NULL I receive the following warning:

1: In independent$Aeretes_melanopterus <- NULL : Coercing LHS to a list
2: In dependent$name <- NULL : Coercing LHS to a list

No elements are deleted and independent and dependent retain all their original rows.

file (input):

structure(list(chroma = c(7.443501276, 10.96156313, 13.2987235, 
17.58110922, 13.4991105), mass..Pantheria. = c(NA, 126.57, NA, 
160.42, 250.57)), .Names = c("chroma", "mass..Pantheria."), class = "data.frame", row.names = c("Aeretes_melanopterus", 
"Ammospermophilus_harrisii", "Ammospermophilus_insularis", "Ammospermophilus_nelsoni", 
"Atlantoxerus_getulus"))
                              chroma mass..Pantheria.
Aeretes_melanopterus        7.443501               NA
Ammospermophilus_harrisii  10.961563           126.57
Ammospermophilus_insularis 13.298723               NA
Ammospermophilus_nelsoni   17.581109           160.42
Atlantoxerus_getulus       13.499111           250.57

desired output:

structure(list(chroma = c(10.96156313, 17.58110922, 13.4991105
), mass..Pantheria. = c(126.57, 160.42, 250.57)), .Names = c("chroma", 
"mass..Pantheria."), class = "data.frame", row.names = c("Ammospermophilus_harrisii", 
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
                            chroma mass..Pantheria.
Ammospermophilus_harrisii 10.96156           126.57
Ammospermophilus_nelsoni  17.58111           160.42
Atlantoxerus_getulus      13.49911           250.57
structure(c(126.57, 160.42, 250.57), .Names = c("Ammospermophilus_harrisii", 
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
Ammospermophilus_harrisii  Ammospermophilus_nelsoni      Atlantoxerus_getulus 
                   126.57                    160.42                    250.57 
structure(c(10.96156313, 17.58110922, 13.4991105), .Names = c("Ammospermophilus_harrisii", 
"Ammospermophilus_nelsoni", "Atlantoxerus_getulus"))
Ammospermophilus_harrisii  Ammospermophilus_nelsoni      Atlantoxerus_getulus 
                 10.96156                  17.58111                  13.49911 
asheets
  • 770
  • 1
  • 8
  • 28
  • 2
    `x$name` uses the name "name" not the string stored in the variable `name`. Instead use `x[[name]]` – Frank May 22 '18 at 15:04
  • This actually gives me another error: Error in dependent[[name]] <- NULL : more elements supplied than there are to replace – asheets May 22 '18 at 15:11
  • 1
    Ok, hmm, I don't use those packages, but I guess it's something to do with how you're supposed to interact with objects of that class. Btw, you might improve your odds of someone figuring it out by making a minimal reproducible example; guidance here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/28481250#28481250 – Frank May 22 '18 at 15:13
  • Looks like you have a `data.frame`, not a `matrix`. – Gregor Thomas May 22 '18 at 15:34
  • 1
    Overall, I think this would be much clearer if you shared a reproducible example. There's probably a better way, but without seeing either sample input or desired output, it's hard to know. Please use Frank's link as a guide and share some data reprodicibly, for example using `dput()`. – Gregor Thomas May 22 '18 at 15:37
  • @Gregor I included some sample data as well as my desired output. Let me know if there's anything else I should include – asheets May 22 '18 at 17:44

1 Answers1

2

Looks like you want to omit rows from your data where chroma or mass..Pantheria are NA. Here's a quick way to do it:

data = data[!is.na(data$chroma) & !is.na(data$mass..Pantheria.), ]

I'm not sure why you are breaking independent and dependent out separately, but after filtering out bad observations is a good time to do it.

Since those are your only two columns, this is equivalent to omitting rows from your data frame that have any NA values, so you can use a shortcut like this:

data = na.omit(data)

If you want to keep a "pristine" copy of your raw data, simply change the name of the result:

data_no_na = na.omit(data)
# or
data = data[!is.na(data$chroma) & !is.na(data$mass..Pantheria.), ]

As to what's wrong with your code, $ is used for extracting columns from a data frame, but you're trying to use it for a named vector (since you've already extracted the columns), which doesn't work. Even then, $ only works with a literal string, you can't use it with a variable. For data frames, you need to use brackets to extract columns stored in variables. For example, the built-in mtcars data has a column called "mpg":

# these work:
mtcars$mpg
mtcars[, "mpg"]

my_col = "mpg"
mtcars[, my_col]
mtcars$my_col ## does not work, need to use brackets!

You can never use $ with row names in a data frame, only column names.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294