0

This is my example dataset:

x <- c("A1", "A1", "A1", "A2", "A2", "A2", "A2", "A3")
y <- c(5347, 5347, 5347, 1819, 1758, 1212, 1212, 1456)

I can't prepare this dataset like input from mapreduce's query after "map|sort", because I have separate \t and after this (it's necessary step in mapreduce to split rows):

fields <- unlist(strsplit(line, "\t"))

where line is my input I get two fields:

  • fields[[1]] = all column x
  • fields[[2]] = all column y

I want to get this result:

ID Count Unique number  
A1    1 (only 5347)
A2    3 (1819, 1758, 1212)
A3    1 (only 1456)

How can I count this, where loop observe column X and Y as long as will search new number in column X and count unique number in column Y for all unique number in column X??

Roger
  • 11
  • 5

1 Answers1

1

The question is not clear (Maybe because of an English problem). But from the expected result, I think you are looking for something like:

tapply(y,x,function(t)length(unique(t)))

A1 A2 A3 
 1  3  1 

Which in English :

Computing the number of unique y for each x.

agstudy
  • 119,832
  • 17
  • 199
  • 261
  • I know this function, but it doesn't work in mapreduce...probably this should be "if loop" – Roger Jun 18 '14 at 11:01