1

I have done a lot of searching on the web for this, but I've found nothing, even though I feel like it has to be somewhat common. I have used Mahout's seqdirectory command to convert a folder containing text files (each file is a separate document) in the past. But in this case there are so many documents (in the 100,000s) that I have one very large text file in which each line is a document. How can I convert this large file to SequenceFile format so that Mahout understands that each line should be considered a separate document? Thank you very much for any help.

Nick
  • 13
  • 1
  • 3
  • possible duplicate of [Converting CSV to SequenceFile](http://stackoverflow.com/questions/11994930/converting-csv-to-sequencefile) – Sean Owen Oct 31 '12 at 13:20

1 Answers1

1

Yeah, it is not quite apparent or very intuitive how to do this, although (lucky for you :P) I have answered that exact question several times here in stack, for instance here. Have a look ;)

Community
  • 1
  • 1
Julian Ortega
  • 947
  • 4
  • 11