2

I am building an App using shiny and openair to analyze wind data.
Right now the data needs to be “cleaned” before uploading by the user. I am interested in doing this automatically. Some of the data is empty, some of is not numeric, so it is not possible to build a wind rose. I want to:

    1. Estimate how much of the data is not numeric
    2. Cut it out and leave only numeric data

here is an example of the data:
the "NO2.mg" is read as a factor and not int becuse it does not consist only numbers
OK
here is a reproducible example:

no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
no2
[1] 5  4  c1 54 c5 1  2  3  4  5  6  7  8  9  10 11 12 13 14
[20] 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
[39] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
52 Levels: 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 ... c5
> as.numeric(no2)
[1] 45 34 51 46 52  1 12 23 34 45 47 48 49 50  2  3  4  5  6
[20]  7  8  9 10 11 13 14 15 16 17 18 19 20 21 22 24 25 26 27
[39] 28 29 30 31 32 33 35 36 37 38 39 40 41 42 43 44
Roland
  • 127,288
  • 10
  • 191
  • 288
eliavs
  • 2,306
  • 4
  • 23
  • 33
  • 4
    `library(fortunes);fortune(206)`. You will need to provide an example of your `data`. Even then.... – mnel Aug 07 '13 at 06:08
  • As a general rule, we are not a help desk. We appreciate if users ask clear, specific questions and show what they've tried and where they got stuck. – Roman Luštrik Aug 07 '13 at 06:23

3 Answers3

9

Worst R haiku ever:

Some of the data is empty, 
some of is not numeric, 
so it is not possible to build a wind rose.
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 4
    being mocked by a super geek programmer group --> check – eliavs Aug 07 '13 at 06:30
  • 1
    @eliavs - well, you could provide some more relevant information as requested by Roman. A bunch of seemingly random figures that aren't reproducible doesn't go very far to allowing us to help. E.g. - `dput(head(ranana.analysed.no2))` might be a good start, or better still, a complete example showing a troublesome section of your input data and an expected output dataset would be helpful. – thelatemail Aug 07 '13 at 06:35
  • @thelatemail thank you, reproducible data is important for help – eliavs Aug 07 '13 at 07:06
4

To convert a factor to numeric, you need to convert to character first:

no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
no2_num <- as.numeric(as.character(no2)) 
#Warning message:
#  NAs introduced by coercion 
no2_clean <- na.omit(no2_num) #remove NAs resulting from the bad data

# [1]  5  4 54  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
# [40] 37 38 39 40 41 42 43 44 45 46 47 48 49
# attr(,"na.action")
# [1] 3 5
# attr(,"class")
# [1] "omit"

length(attr(no2_clean,"na.action"))/length(no2)*100
#[1] 3.703704
Roland
  • 127,288
  • 10
  • 191
  • 288
1

OK this is how i did it i am sure someone has abetter way
i'd love it if you share with me
this is my data:
no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
to count the "bad data:"

sum(is.na((as.numeric(as.vector(no2)))))

and to estimate the percent of bad data:
sum(is.na((as.numeric(as.vector(no2)))))/length(no2)*100

eliavs
  • 2,306
  • 4
  • 23
  • 33
  • The `as.vector` is superfluous, but `sum()`-ing `is.na()` is fairly standard. Did you have any interest in "recovering" data by converting "c5" to "5"? – IRTFM Aug 07 '13 at 19:22
  • 1
    @DWin Factors are not vectors and `as.vector` coerces them to character. It's not superfluous here. – Roland Aug 07 '13 at 23:00
  • Interesting ... didn't realize that `as.vector` would do the same as `as.character`. But that doesn't change the fact that it's superfluous, because its getting passed to `is.na` which doesn't care whether it's "numeric" or "character". Consider: `sum(is.na(factor(c(letters, NA)))`. The `as.vector.factor` function with its default arguments removes the levels attributes and converts to `levels(fac)[fac]`. – IRTFM Aug 07 '13 at 23:10
  • @DWin But `as.numeric` won't create `NA`s when used on a factor, only when used on a character. – Roland Aug 08 '13 at 06:33
  • It's easy to disprove that claim: `as.numeric( factor(c(1:3, NA))) [1] 1 2 3 NA` – IRTFM Aug 08 '13 at 06:35
  • 1
    @DWin Of course `as.numeric` propagates `NA`. But that's not creating `NA`. The relevant cases are `as.numeric(factor(c(1:3,"a")))` vs. `as.numeric(as.character(factor(c(1:3,"a"))))` – Roland Aug 08 '13 at 13:19