0

I have this loop to compute the mean per column, which works.

for (i in 1:length(DF1)) {     
    tempA <- DF1[i]                                 # save column of DF1 onto temp variable 
    names(tempA) <- 'word'                          # label temp variable for inner_join function
    DF2 <- inner_join(tempA, DF0, by='word')        # match words with numeric value from look-up DF0
    tempB <- as.data.frame(t(colMeans(DF2[-1])))    # compute mean of column
    DF3<- rbind(tempB, DF3)                         # save results togther
}

The script uses the dplyr package for inner_join.

  • DF0 is the look-up database with 3 columns (word, value1, value2, value3).
  • DF 1 is the text data with one word per cell.
  • DF3 is the output.

Now I want to compute the median instead of the mean. It seemed easy enough with the colMedians function from 'robustbase', but I can't get the below to work.

library(robustbase)

for (i in 1:length(DF1)) {     
    tempA <- DF1[i]
    names(tempA) <- 'word'
    DF2 <- inner_join(tempA, DF0, by='word')
    tempB <- as.data.frame(t(colMedians(DF2[-1])))
    DF3<- rbind(tempB, DF3) 
}

The error message reads:

Error in colMedians(tog[-1]) : Argument 'x' must be a matrix.

I've tried to format DF2 as a matrix prior to the colMedians function, but still get the error message:

Error in colMedians(tog[-1]) : Argument 'x' must be a matrix.

I don't understand what is going on here. Thanks for the help!

Happy to provide sample data and error traceback, but trying to keep it as crisp and simple as possible.

Community
  • 1
  • 1
Simone
  • 497
  • 5
  • 19
  • 2
    When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick May 21 '18 at 15:21
  • 1
    Have you tried to use the median function from the stats package? – Dave Rosenman May 21 '18 at 15:22
  • 1
    Try `colMedians(data.matrix(DF2[-1]))`. – Rui Barradas May 21 '18 at 15:24
  • Change colMedians for apply: tempB <- as.data.frame(apply(DF2[-1], 2, median)) – Juan Antonio Roldán Díaz May 21 '18 at 15:38
  • Why `*_all*` approach wouldn't work here as in: `mtcars %>% summarise_all(funs(median))`? I reckon that some sample data would help. – Konrad May 21 '18 at 16:04
  • @MrFlick anticipating this comment I said I'd provide data if required :-) guess that is redundant now. – Simone May 22 '18 at 15:53
  • @RuiBarradas in the nerd spirit had to try your solution as well. worked perfectly. Want to post it so I can accept it as answer? – Simone May 22 '18 at 16:02
  • @JuanAntonioRoldánDíaz the apply family did help. However, I don't need to get rid of the first row, instead need to specify the columns as the joined DF2 inherits the data structure from the look-up DF0. – Simone May 22 '18 at 16:07

2 Answers2

2

According to the comment by the OP, the following solved the problem.
I have added a call to library(dplyr).
My contribution was colMedians(data.matrix(DF2[-1]), na.rm = TRUE).

library(robustbase)
library(dplyr)

for (i in 1:length(DF1)) {     
    tempA <- DF1[i]
    names(tempA) <- 'word'
    DF2 <- inner_join(tempA, DF0, by='word')
    tempB <- colMedians(data.matrix(DF2[-1]), na.rm = TRUE)
    DF3 <- rbind(tempB, DF3) 
}
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
0

Stumbled on this answer which helped me fix the loop as following:

DF3Mean <- data.frame()                         # instantiate dataframe 
DF4Median <- data.frame(                        # instantiate dataframe

for (i in 1:length(DF1)) {     
tempA <- DF1[i]                                 # save column of DF1 onto temp variable 
names(tempA) <- 'word'                          # label temp variable for inner_join function
DF2 <- inner_join(tempA, DF0, by='word')        # match words with numeric value from look-up DF0
tempMean <- as.data.frame(t(colMeans(DF2[-1]))) # compute mean of column
DF3Mean <- rbind(tempMean, DF3Mean)             # save results togther
tempMedian <- apply(DF2[ ,2:4], 2, median)      #compute mean for columns 2,3, and 4 
DF4Median <- rbind(tempMedian, DF4Median)       # save results togther
}

I guess I was too stuck in my mind on the colMedian function.

Simone
  • 497
  • 5
  • 19