1

I am new to the group and R. I posted a question earlier, and am very grateful for the help provided, but realised that I was not very clear. So here is hopefully a better attempt to clarify my problem. Thanks in advance to anyone that can assist.

I have a dataset which contains racehorse names along with each racehorses performance rating, 'DaH', (an integer) and the type of race the horses ran in 'Code',(character).

The dataset has 7 horses with performance ratings (DaH) for the last 3 races, DaH1 to DaH3, where DaH1 is the performance rating for the last/most recent race and DaH3 was the horses performance rating from 3 races ago. Likewise the Codes, Code1 to Code3, provide the Code that the horse ran in.

So looking at the dataset below you can see that horse2 ran in a code C race in its last race and achieved a performance rating DaH1 of 124.

I import the horserace data from a provider of horse racing data.

> racehorse_data
  Code1 DaH1 Code2 DaH2 Code3 DaH3
1     C    0     C  124     C  127
2     C  124     C  117     C    0
3     C  121     C  125     C    0
4     C  123     C  120     C  124
5     C    0     C  125     H  122
6     H  122     C    0     C  137
7     C  110     H  115     C  127  
structure(list(Code1 = c("C", "C", "C", "C", "C", "H", "C"), 
    DaH1 = c(0L, 124L, 121L, 123L, 0L, 122L, 110L), Code2 = c("C", 
    "C", "C", "C", "C", "C", "H"), DaH2 = c(124L, 117L, 125L, 
    120L, 125L, 0L, 115L), Code3 = c("C", "C", "C", "C", "H", 
    "C", "C"), DaH6 = c(127L, 0L, 0L, 124L, 122L, 137L, 127L)), class = "data.frame", row.names = c(NA, 
-7L))

I need to calculate the mean DaH rating for each horse once I have stripped out any performance rating that was achieved in a code that was not a 'C' code. To clarify: The problem i'm having is that I need to remove any performance ratings, DaH, where the Code of the race was not a 'C' code race.

So for example you can see that horse 6 ran in a 'H' code race in its last race. This needs to be removed from the dataset so that just the two races where horse 6 achieved DaH2=0 and DaH3=137 remain to give a mean DaH rating of 68.5 for its two code C races.

I have been playing with the MELT command trying to achieve a reconfigured dataframe that will enable me to filter out the DaH ratings where the code of the race was not equal to 'C'. But i'm not managed to achieve this.

It occured to me that I could isolate each horses DaH and Code as a seperate dataframe and work through each horse seperately but this seems more complicated than i'd like.

My ideal output would be a dataframe such as:

> racehorse_data
   DaH1   DaH2  DaH3    Mean
1     0    124   127    83.7
2   124    117     0    80.3
3   121    125     0    82
4   123    120   124    122.3
5     0    125    na    62.5
6    na      0   137    68.5
7   110     na   127    118.5

Or perhaps a seperate vector for each horse

horse 1     0   124 127 83.7
horse 2     124 117 0   80.3
etc

Many Thanks Graham

Bookie
  • 81
  • 3
  • Your example `dput` data has `DaH6` instead of `DaH3`. Is that intentional? – thelatemail May 27 '20 at 11:24
  • You are 'melt'-ing in two groups here - something like `reshape(dat, idvar="horse", varying=TRUE, sep="", direction="long")` should give you what you want I think in base R. Similar examples shown here - https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement-columns-wide-format-into-single-columns and here - https://stackoverflow.com/questions/23945350/reshaping-wide-to-long-with-multiple-values-columns – thelatemail May 27 '20 at 11:34

1 Answers1

0

In Base-R:

first we change the corresponding non-C values to NA

racehorse_data[,c(2,4,6)][racehorse_data[,c(1,3,5)]!="C"] <- NA

Then we trim the dataframe and get row means ignoring the NA values

racehorse_data <- racehorse_data[,c(2,4,6)]
racehorse_data$Mean <- apply(racehorse_data,1,mean,na.rm=T)

output:

> racehorse_data
  DaH1 DaH2 DaH6      Mean
1    0  124  127  83.66667
2  124  117    0  80.33333
3  121  125    0  82.00000
4  123  120  124 122.33333
5    0  125   NA  62.50000
6   NA    0  137  68.50000
7  110   NA  127 118.50000
Daniel O
  • 4,258
  • 6
  • 20