0

R beginner here. I have a data.frame that contains information on trotting horses (their wins, earnings, time records and such). I have a subsetted data.frame organised in a way that every row contains information for every specific year the horse competed. I have a variable called Competition.age that states what age the horses were each year they competed.

I'm writing down my summary statistics stratified by age and sex of the horse using both the summary() function and describe() from package psych. For example:

summary(Data_year[Data_year$Competition.age>="3"& 
Data_year$Competition.age<="6"& Data_year$Sex=="Mare", ])

This works perfectly fine. But when I try to get a range between 7 and 10 years (instead of 3 and 6), it only returns NA's. The str() function with this line of code returns a blank list of variables-for some reason it won't read the data.

I've even created separate subsetted data.frames with only these years (7, 8, 9 and 10 respectively) and there are no problems with those, individually. I created subsetted data frames with ranges 7-8, 7-9 and they were fine! But 7-10 created an empty data.frame.

Any help will be greatly appreciated!!

jogo
  • 12,469
  • 11
  • 37
  • 42
  • 3
    This is a question about data, not code per se. Can you provide a small, [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? – Roman Luštrik Jun 09 '17 at 12:48
  • 2
    Is `Data_year$Competition.age` a character vector? See/explore: `"3" < "10"` (the result is `FALSE`) But `"7" < "8"` gives `TRUE` – jogo Jun 09 '17 at 12:48
  • @jogo I see where you're heading, however subsetting on ranges 7-8 (although Laura didn't show how she does this) works. – Roman Luštrik Jun 09 '17 at 12:50
  • Competition.age is defined as an integer in my data. I did the subsetting of data like: `data7<-Data_year[[Data_year$Competition.age>="7", ]` And so forth. I don't know what kind of example you want...I'm just so confused since the data hasnt given me any problems with any other age range! – Laura Bas Jun 09 '17 at 12:59
  • So you want `data7 <- Data_year[Data_year$Competition.age>= 7, ]` – jogo Jun 09 '17 at 13:02
  • 1
    It is the following fact: `"7"` is not numeric. If you compare a numeric value with a non-numeric value (e.g. character) then the numeric value is coerced to character and the comparasition is done for characters (alphabetical order). In alphabetical order `"3"`is greater (after) `"10`" – jogo Jun 09 '17 at 13:08
  • and the reason you didn't have problems before that is that you hadn't hit a two-digit number yet ... – Ben Bolker Jun 09 '17 at 13:09
  • 1
    example: `age <- 1:15; sort(as.character(age))` – jogo Jun 09 '17 at 13:15
  • Thank you all for your comments! I understand now what went wrong. Sorry I took a long time to look at the answers too! I really appreciate how helpful everyone is here :) :) – Laura Bas Jun 13 '17 at 10:17

1 Answers1

0

In your comment you wrote Data_year$Competition.age is an integer. Now it is the following fact: "7" is not numeric. If you compare a numeric value with a non-numeric value (e.g. character) then the numeric value is coerced to character and the comparison is done for characters (alphabetical order). In alphabetical order "3"is greater (after) "10"
See this example:

age <- 1:15
sort(as.character(age))

You want Data_year$Competition.age>=3and Data_year$Competition.age<=6 and so on.

jogo
  • 12,469
  • 11
  • 37
  • 42
  • Alright, thanks! I'm such a newbie in all this that I don't know these basic things yet. Thank you so much for your help! – Laura Bas Jun 13 '17 at 10:18