I have done a lot of searching on the web for this, but I've found nothing, even though I feel like it has to be somewhat common. I have used Mahout's seqdirectory command to convert a folder containing text files (each file is a separate document) in the past. But in this case there are so many documents (in the 100,000s) that I have one very large text file in which each line is a document. How can I convert this large file to SequenceFile format so that Mahout understands that each line should be considered a separate document? Thank you very much for any help.
Asked
Active
Viewed 4,188 times
1
-
possible duplicate of [Converting CSV to SequenceFile](http://stackoverflow.com/questions/11994930/converting-csv-to-sequencefile) – Sean Owen Oct 31 '12 at 13:20
1 Answers
1
Yeah, it is not quite apparent or very intuitive how to do this, although (lucky for you :P) I have answered that exact question several times here in stack, for instance here. Have a look ;)

Community
- 1
- 1

Julian Ortega
- 947
- 4
- 11