This is my example dataset:
x <- c("A1", "A1", "A1", "A2", "A2", "A2", "A2", "A3")
y <- c(5347, 5347, 5347, 1819, 1758, 1212, 1212, 1456)
I can't prepare this dataset like input from mapreduce's query after "map|sort", because I have separate \t and after this (it's necessary step in mapreduce to split rows):
fields <- unlist(strsplit(line, "\t"))
where line is my input I get two fields:
- fields[[1]] = all column x
- fields[[2]] = all column y
I want to get this result:
ID Count Unique number
A1 1 (only 5347)
A2 3 (1819, 1758, 1212)
A3 1 (only 1456)
How can I count this, where loop observe column X and Y as long as will search new number in column X and count unique number in column Y for all unique number in column X??