I'm using saveAsTextFile(path)
in order to save output as text file in later to import the result to DB. The output looks something like this:
(value1, value2)
How to remove the parentheses?
I'm using saveAsTextFile(path)
in order to save output as text file in later to import the result to DB. The output looks something like this:
(value1, value2)
How to remove the parentheses?
You can try the following which is very basic:
rdd.map(x => x._1 + "," + x._2).saveAsTextFile(path)
You just map your RDD[(A,B)] to an RDD[String] and save it.
Before making saveAsTextFile
use map(x => x.mkString(",")
rdd.map(x => x.mkString(",").saveAsTextFile(path)
Output will not have bracket.
For the folks in the Java world, here is a solution that starts with a DataFrame, converts it to an RDD and then writes the results. The rows of the RDD are passed through the map function that converts the Row into a String.
public void write(DataFrame output) {
String path = "your_path_goes_here";
output
.toJavaRDD()
.map(new BracketRemover())
.saveAsTextFile(path);
}
protected class BracketRemover implements Function<Row, String> {
public String call(Row r) {
return r.mkString(",");
}
}
You can save rdd by using rdd.map(rec => rec.productIterator.mkString(","). saveAsTextFile(path) Resulting dataset will not have parentheses.
I know it is tagged Scala, but just to add on Python's side in case anyone is curious. Create the RDD and save as is
rdd_of_tuples = sc.parallelize([('one',1),('two',2)])
rdd_of_tuples.saveAsTextFile('/user/cloudera/rdd_of_tuples')
This will save the rows like this as you mention
('one', 1)
But if you do the following it should work
rdd_of_text = rdd_of_tuples.map(lambda (x,y): x + ',' + str(y)).saveAsTextFile('/user/cloudera/rdd_of_text')
And you should get
one,1
Note that in this particular case you need to be aware of the types for concatenating (check the str(y)), else you would get the following exception
TypeError: cannot concatenate 'str' and 'int' objects