1

I am trying to combine two dataframes with different number of columns and column headers. However, after I combine them using rbind.fill(), the resulting file has filled the empty cells with NA.

This is very inconvenient since one of the columns has data that is also represented as "NA" (for North America), so when I import it into a csv, the spreadsheet can't tell them apart.

Is there a way for me to:

  1. Use the rbind.fill function without having it populate the empty cells with NA

or

  1. Change the column to replace the NA values*

*I've scoured the blogs, and have tried the two most popular solutions:

df$col[is.na(df$col)] <- 0, #it does not work
df$col = ifelse(is.na(df$col), "X", df$col), #it changes all the characters to numbers, and ruins the column

Let me know if you have any advice! I (unfortunately) cannot share the df, but will be willing to answer any questions!

Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
David
  • 21
  • 2
  • 4
  • The string "it does not work" is not a built-in error message in R. –  Mar 13 '13 at 20:50
  • Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). If you cannot share your data frame, make a small example that replicates your problem. – Blue Magister Mar 13 '13 at 20:54
  • 5
    have you tried altering the 'na' option of write.csv? – Jesse Anderson Mar 13 '13 at 20:55
  • Jack - sorry, i meant to say that it had no impact on my spreadsheet. There was no error when I ran it, but a warning message: In `[<-.factor`(`*tmp*`, (is.na(df$col)), : invalid factor level, NAs generated – David Mar 13 '13 at 21:33
  • blind Jesse- Your solution works, but it replaces all the NA's in the spreadsheet, but I only want to replace it for one particular column. Any ideas? – David Mar 13 '13 at 21:33

2 Answers2

3

NA is not the same as "NA" to R, but might be interpreted as such by your favourite spreadsheet program. NA is a special value in R just like NaN (not a number). If I understand correctly, one of your solutions is to replace the "NA" values in the column representing North America with something else, in which case you should just be able to do...

df$col[ df$col == "NA" ] <- "NorthAmerica"

This is assuming that your "NA" values are actually character strings. is.na() won't return any values if they are character strings which is why df$col[ is.na(df$col) ] <- 0 won't work.

An example of the difference between NA and "NA":

x <- c( 1, 2, 3 , "NA" , 4 , 5 , NA )

> x[ !is.na(x) ]
[1] "1"  "2"  "3"  "NA" "4"  "5"

> x[ x == "NA" & !is.na(x) ]
[1] "NA"

Method to resolve this

I think you want to leave "NA" and any NAs as they are in the first df, but make all NA in the second df formed from rbind.fill() change to something like "NotAvailable". You can accomplish this like so...

df1 <- data.frame( col = rep( "NA" , 6 ) , x = 1:6 , z = rep( 1 , 6 ) )
df2 <- data.frame( col = rep( "SA" , 2 ) , x = 1:2 , y = 5:6 )
df <- rbind.fill( df1 , df2 )
temp <- df [ (colnames(df) %in% colnames(df2)) ]
temp[ is.na( temp ) ] <- "NotAvailable"
res <- cbind( temp , df[ !( colnames(df) %in% colnames(df2) ) ] )

#df has real NA values in column z and column y. We just want to get rid of y's
df

#     col x  z  y
#   1  NA 1  1 NA
#   2  NA 2  1 NA
#   3  NA 3  1 NA
#   4  NA 4  1 NA
#   5  NA 5  1 NA
#   6  NA 6  1 NA
#   7  SA 1 NA  5
#   8  SA 2 NA  6

#res has "NA" strings in col representing "North America" and NA values in z, whilst those in y have been removed
#More generally, any NA in df1 will be left 'as-is', whilst NA from df2 formed using rbind.fill will be converted to character string "NotAvilable"
res

#     col x            y  z
#   1  NA 1 NotAvailable  1
#   2  NA 2 NotAvailable  1
#   3  NA 3 NotAvailable  1
#   4  NA 4 NotAvailable  1
#   5  NA 5 NotAvailable  1
#   6  NA 6 NotAvailable  1
#   7  SA 1            5 NA
#   8  SA 2            6 NA
Community
  • 1
  • 1
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • Actually, the "NA" values for North America is fixed. I want to change the "null" NA value to something else, such as a "-" – David Mar 13 '13 at 21:30
  • @David did the solution below work for you? Does it need adjusting more? – Simon O'Hanlon Mar 14 '13 at 12:24
  • Thanks for the help help Simon. I actually couldn't get the x[is.na(x)] <- "notavailable" function to work, and I had been trying for awhile. However, I took @blindJesse advice above, and used the na functoin in write.csv. Only problem now is, all the column's NA changes, and I had only wanted to change one column....any thoughts? – David Mar 14 '13 at 12:34
  • Hi @David I'll edit my answer so you can more clearly see the steps. I think we are probably miscommunicating around the rbind.fill stage. – Simon O'Hanlon Mar 14 '13 at 12:40
  • @David edited- let me know if this is what you were expecting. – Simon O'Hanlon Mar 14 '13 at 12:49
  • That is the basic gist of it, except imagine if df1 has another column (Column Z) that has all values of 1. Then, when you rbind df1 and df2, in the resulting df, both Columns Y and Z will have "NA" (notAvailable) values. However, I would like to only change the NA's in column Y to "NotAvailable". – David Mar 14 '13 at 14:15
  • @David Hmmm, ok. So you have both "NA" for North America and `NA` for missing values in df1. You want missing values (`NA`) in df1 to be retained as missing values, but missing values in df2 when it is bound, to be called something else?! – Simon O'Hanlon Mar 14 '13 at 14:20
0

If you have a dataframe that contains NA's and you want to replace them all you can do something like:

df[is.na(df)] <- -999

This will take care of all NA's in one shot

If you only want to act on a single column you can do something like

df$col[which(is.na(df$col))] <- -999
kith
  • 5,486
  • 1
  • 21
  • 21
  • I tried that in my original question, but unfortunately, it had no impact on my spreadsheet. Thanks though! – David Mar 13 '13 at 21:30