0

The function extractFeatures from NMF package can select features using the following method only if the features fulfill both following criteria are retained:

score greater than \hat{\mu} + 3 \hat{\sigma}, where \hat{\mu} and \hat{\sigma} are the median and the median absolute deviation (MAD) of the scores respectively;

the maximum contribution to a basis component is greater than the median of all contributions (i.e. of all elements of W).

How can I write this function in R that only applies the first criteria to data matrix?

Kim H and Park H (2007). "Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis." Bioinformatics (Oxford, England), 23(12), pp. 1495-502. ISSN 1460-2059, , .

Seymoo
  • 177
  • 2
  • 15
  • So given a vector `scores`, you want to check which of its components are greater than `median(scores) + MAD(scores)`? If not, be more specific and see https://stackoverflow.com/q/5963269/1320535 – Julius Vainora Mar 05 '18 at 16:51
  • Yes @Julius. It should be + 3 times MAD. – Seymoo Mar 05 '18 at 17:11

1 Answers1

1

Given a vector scores, the condition for each score can be checked as follows:

scores <- rnorm(5)
scores > (median(scores) + 3 * mad(scores))
# [1] FALSE FALSE FALSE FALSE FALSE

where we don't need to look for a function for MAD as mad from the package stats does exactly that. Now if you want to select corresponding columns from some matrix M, you could write simply

M[, scores > (median(scores) + 3 * mad(scores))]

And if you prefer a function for that, then you may use

featureCriterion <- function(M, scores)
  M[, scores > (median(scores) + 3 * mad(scores))]
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102