I'm trying to read text or gz file from HDFS and run a simple mapreduce job (actually only the map job) but got error which seems like the readLines part doesn't work. I'm seeking answers of whether I can use readLines function in mapreduce. ps. there is no problem if I just use readLines function to parse HDFS files outside of mapreduce job. Thanks.
counts <- function(path){
ct.map <- function(., lines) {
line <- readLines(lines)
word <- unlist(strsplit(line, pattern = " "))
keyval(word, 1)
}
mapreduce(
input = path,
input.format = "text",
map = ct.map
)
}
counts("/user/ychen/100.txt")