3

There exists in Mahout a command for create sequence file as bin/mahout seqdirectory -c UTF-8 -i <input address> -o <output address>. I want use this command as code API.

Sean Owen
  • 66,182
  • 23
  • 141
  • 173

1 Answers1

3

You can do something like this:

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;


Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);

Path outputPath = new Path("c:\\temp");

Text key = new Text(); // Example, this can be another type of class
Text value = new Text(); // Example, this can be another type of class

SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, outputPath, key.getClass(), value.getClass());

while(condition) {

    key = Some text;
    value = Some text;

    writer.append(key, value);
}

writer.close();

You can find more information here and here

Additionally, you could call the exact same functionality you described from Mahout by using the org.apache.mahout.text.SequenceFilesFromDirectory

Then the call looks something like this:

ToolRunner.run(new SequenceFilesFromDirectory(), String[] args //your parameters);

The ToolRunner comes from org.apache.hadoop.util.ToolRunner

Hope this was of help.

Sean Owen
  • 66,182
  • 23
  • 141
  • 173
Julian Ortega
  • 947
  • 4
  • 11
  • You might also want to look [**here**](http://stackoverflow.com/questions/11479600/runing-a-simple-mahout-program), where the code uses both the SequenceFile Writer and Reader. – Julian Ortega Jul 25 '12 at 08:32
  • What is the Path `"appledata/apples"` in `Path path = new Path("appledata/apples");` in [here](http://stackoverflow.com/questions/11479600/runing-a-simple-mahout-program). If this is a address directory? – Arash Hosseinabady Jul 26 '12 at 07:34
  • It might be relative to the [Hadoop File System (HDFS)](http://hadoop.apache.org/hdfs) – Julian Ortega Jul 26 '12 at 07:52
  • So, how I can set address for this? I haven't more information about this. – Arash Hosseinabady Jul 26 '12 at 09:12
  • Then you don't need the HDFS, just specify the local path for where you want the output to be written. – Julian Ortega Jul 26 '12 at 11:05
  • Set the output! I want give it the input address file text and save `sequence file` it to output address. I want set both input and output address. – Arash Hosseinabady Jul 26 '12 at 12:08
  • I have already stated how to do so. You would do something like this `ToolRunner.run(new SequenceFilesFromDirectory(), {"-c", "UTF-8", "-i", "c:\\inputPath", "-o", "c:\\outputPath"});` – Julian Ortega Jul 26 '12 at 12:10
  • I use your code. But, this code not append new input to sequence file. In every run this code, create new "sequence file". – Arash Hosseinabady Aug 07 '12 at 06:13
  • Of course it creates a new sequence file. The code I presented creates a new `SequenceFile.Writer` every time you run it, so it will indeed overwrite anything that is present (if the output path is the same). If want you want to do is append your new data to the existing sequence file, you need to make your own code. – Julian Ortega Aug 08 '12 at 09:15