Mapreduce in R - how can i implement "loop if" in reduce?

Question

This is my example dataset:

x <- c("A1", "A1", "A1", "A2", "A2", "A2", "A2", "A3")
y <- c(5347, 5347, 5347, 1819, 1758, 1212, 1212, 1456)

I can't prepare this dataset like input from mapreduce's query after "map|sort", because I have separate \t and after this (it's necessary step in mapreduce to split rows):

fields <- unlist(strsplit(line, "\t"))

where line is my input I get two fields:

fields[[1]] = all column x
fields[[2]] = all column y

I want to get this result:

ID Count Unique number  
A1    1 (only 5347)
A2    3 (1819, 1758, 1212)
A3    1 (only 1456)

How can I count this, where loop observe column X and Y as long as will search new number in column X and count unique number in column Y for all unique number in column X??

@Roger, we are in similar boat as agstudy, you need to post expected sample output for your question to make sense — Silence Dogood, Jun 18 '14 at 10:30
@Roger I see right now you are new at SO. Please read [this](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to know how to ask a question. — agstudy, Jun 18 '14 at 10:33

score 1 · Answer 1 · answered Jun 18 '14 at 10:40

1

The question is not clear (Maybe because of an English problem). But from the expected result, I think you are looking for something like:

tapply(y,x,function(t)length(unique(t)))

A1 A2 A3 
 1  3  1

Which in English :

Computing the number of unique y for each x.

answered Jun 18 '14 at 10:40

agstudy

119,832
17
199
261

I know this function, but it doesn't work in mapreduce...probably this should be "if loop" – Roger Jun 18 '14 at 11:01

Mapreduce in R - how can i implement "loop if" in reduce?

1 Answers1