1

I am importing data from an xlsx (https://www.dropbox.com/s/r5sn5pio5rnprdq/gesammelte%20Daten_1707.xlsx) file with read.xlsx

setwd("C:/***//Kultivierungen//1707_ADH//")
pH <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="pH")
OD <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="OD")
Glc <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="Glucose")
Ac <- read.xlsx("gesammelte Daten_1707.xlsx", sheetName="Acetate")

I want to delete the NA Values with

OD <- OD[rowSums(is.na(OD))==0,]
Glc <- Glc[rowSums(is.na(Glc))==0,]
Ac <- Ac[rowSums(is.na(Ac))==0,]
pH <- pH[rowSums(is.na(pH))==0,]

..which works fine for the OD and pH data, but not for Ac and Glc. The result before deleting the NA Values looks like this:

  time.in.h               SPL1 SPL1_Error               SPL2 SPL2_Error               SPL3 SPL3_Error
1  0.000000               <NA>       <NA>               <NA>       <NA>               <NA>       <NA>
2  1.502222               <NA>       <NA>               <NA>       <NA>               <NA>       <NA>
3  3.687778 0.0602636534839925       0.06 0.0502197112366604       0.09 0.0301318267419962       0.03
4 10.248889                                                                                          
5 16.248333  0.118460019743337       0.06 0.0829220138203356       0.12  0.106614017769003       0.18
6 21.653056 0.0644511581067472       0.03 0.0161127895266868       0.15 0.0483383685800604       0.12
7 29.653333                                                                                          
8 37.652778                                                                                          
9 43.391667  0.342347696879643       0.18  0.271025260029718       0.18  0.727488855869242       0.24

And after deleting the NA Values..:

  time.in.h               SPL1 SPL1_Error               SPL2 SPL2_Error               SPL3 SPL3_Error
3  3.687778 0.0602636534839925       0.06 0.0502197112366604       0.09 0.0301318267419962       0.03
4 10.248889                                                                                          
5 16.248333  0.118460019743337       0.06 0.0829220138203356       0.12  0.106614017769003       0.18
6 21.653056 0.0644511581067472       0.03 0.0161127895266868       0.15 0.0483383685800604       0.12
7 29.653333                                                                                          
8 37.652778                                                                                          
9 43.391667  0.342347696879643       0.18  0.271025260029718       0.18  0.727488855869242       0.24

str() returns the following:

> str(Glc)
'data.frame':   9 obs. of  17 variables:
 $ time.in.h : num  0 1.5 3.69 10.25 16.25 ...
 $ SPL1      : Factor w/ 5 levels "","0.0602636534839925",..: NA NA 2 1 4 3 1 1 5
 $ SPL1_Error: Factor w/ 4 levels "","0.03","0.06",..: NA NA 3 1 3 2 1 1 4
 $ SPL2      : Factor w/ 5 levels "","0.0161127895266868",..: NA NA 3 1 4 2 1 1 5
 $ SPL2_Error: Factor w/ 5 levels "","0.09","0.12",..: NA NA 2 1 3 4 1 1 5

It has worked fine before with a different set of data/xlsx file, I tried to rule out all format-issues in the xlsx file as well, but couldn´t find anything....anyone had this before?

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
plastikdusche
  • 235
  • 2
  • 10
  • There seem to be empty cells in your data (e.g. SPL1, row 4,7,8), which I don't understand. Usually when there are empty cells in the xlsx-file that is imported, read.xlsx defines those as `NA`. Can you provide a reproducible example (of both your data in xlsx-file and data.frame in R)? Also see: http://stackoverflow.com/questions/5963269 – Rob Aug 20 '13 at 13:19
  • This is what a sheet in my xlsx file looks like: `> d1 <- rbind(c("time", "Sample", "Error"), c("","",""), c(1,3.4,0.5),c("","",""), c(1,3.4,0.5)) > d1 [,1] [,2] [,3] [1,] "time" "Sample" "Error" [2,] "" "" "" [3,] "1" "3.4" "0.5" [4,] "" "" "" [5,] "1" "3.4" "0.5" ` – plastikdusche Aug 20 '13 at 13:44
  • I want to delete the rows with empty cells. Normally R assigns NA to those cells, for whatever reason, read.xlsx only imports it this way: `> d1_import <- rbind(c("time", "Sample", "Error"), c("","",""), c(1,3.4,0.5),c("","",""), c(1,3.4,0.5)) > d1_import [,1] [,2] [,3] [1,] "time" "Sample" "Error" [2,] "" "" "" [3,] "1" "3.4" "0.5" [4,] "" "" "" [5,] "1" "3.4" "0.5" ` – plastikdusche Aug 20 '13 at 13:46
  • Are any of the cells in excel text-formatted? (read.xlsx imports those as factor) If so you first have to convert those to numbers (see http://office.microsoft.com/en-001/excel-help/convert-numbers-stored-as-text-to-numbers-HP010342308.aspx). And do you use "" in Excel? (that will also be recognized as text and transformed to factor) – Rob Aug 20 '13 at 13:56
  • that was one of the first things I did, I also removed all of the formulas etc, just pasted the values back into the sheet and afterwards I checked the correct format again.. https://www.dropbox.com/s/r5sn5pio5rnprdq/gesammelte%20Daten_1707.xlsx – plastikdusche Aug 20 '13 at 14:10
  • Thanks for providing the example data. See my answer below. – Rob Aug 20 '13 at 14:26

3 Answers3

0

I'm not completely sure what's your problem but here's some fixes:

Do you want to replace NA values with zeros? In that case you could try:

Glc[is.na(Glc)] <- 0

Or do you want to delete the observations with NAs? In that case it seems to be working correctly for the first two observations. But e.g. the fourth observation seems to have something else (zero?) than NA in the xlsx-file.

Antti
  • 1,263
  • 2
  • 16
  • 28
  • Replacing the NA-Values (or better deleting the whole row containing NA-values) works perfectly fine. The Problem is that read.xlsx only writes NA-values for the first two empty rows in the Glc dataset but not for the following empty rows. It works fine in a different dataset that is in the same xlsx file :/ – plastikdusche Aug 20 '13 at 13:35
0

It seems that the empty cells in the Glucose and Acetate-sheet are recognized as text, although I am not sure why (Excel is not really my expertise..).

When I replace the empty cells in a column in the xlsx-file with 0 and then I delete those 0's again read.xlsx does import it as numeric vector instead of a factor and assigns NA to the empty cells. Then, you can use data <- data[rowSums(is.na(data))==0,] to remove the rows that contain NA's.

Can't tell you what exactly is going on here, but the above solution seems to work.

Rob
  • 1,460
  • 2
  • 16
  • 23
  • Thanks a lot for your help. I did a small manual workaround in the meantime as well...I deleted the whole Glucose and Acetate sheet and re-pasted only the cells containing numbers (or text for the first row) and it worked too... funky – plastikdusche Aug 20 '13 at 14:33
0

I had the same problem and worked it out. In Excel, instead of delete use "Clear All" then save your Excel file and import it R. This way R does not show any N/A.

Heerbod
  • 53
  • 9