10

I'm using saveAsTextFile(path) in order to save output as text file in later to import the result to DB. The output looks something like this:

(value1, value2)

How to remove the parentheses?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Userrrrrrrr
  • 399
  • 6
  • 18

6 Answers6

15

You can try the following which is very basic:

rdd.map(x => x._1 + "," + x._2).saveAsTextFile(path)

You just map your RDD[(A,B)] to an RDD[String] and save it.

eliasah
  • 39,588
  • 11
  • 124
  • 154
  • @Ashish if you have a comment, please use the comment box and don't edit the answer ! Also The code you suggested in the edit isn't related to the question here. Your code is working with a RDD[Row] which is not the case here. – eliasah Mar 30 '17 at 14:59
7

Before making saveAsTextFile use map(x => x.mkString(",")

rdd.map(x => x.mkString(",").saveAsTextFile(path)

Output will not have bracket.

vindev
  • 2,240
  • 2
  • 13
  • 20
3

For the folks in the Java world, here is a solution that starts with a DataFrame, converts it to an RDD and then writes the results. The rows of the RDD are passed through the map function that converts the Row into a String.

public void write(DataFrame output) {
    String path = "your_path_goes_here";
    output
        .toJavaRDD()
        .map(new BracketRemover())
        .saveAsTextFile(path);
}

protected class BracketRemover implements Function<Row, String> {
    public String call(Row r) {
        return r.mkString(",");
    }
}
bruce szalwinski
  • 724
  • 1
  • 8
  • 27
1

Try explicitly using mkString rather than just printing a tuple directly.

lmm
  • 17,386
  • 3
  • 26
  • 37
0

You can save rdd by using rdd.map(rec => rec.productIterator.mkString(","). saveAsTextFile(path) Resulting dataset will not have parentheses.

Nikkhiel24
  • 21
  • 4
0

I know it is tagged Scala, but just to add on Python's side in case anyone is curious. Create the RDD and save as is

rdd_of_tuples = sc.parallelize([('one',1),('two',2)])
rdd_of_tuples.saveAsTextFile('/user/cloudera/rdd_of_tuples')

This will save the rows like this as you mention

('one', 1)

But if you do the following it should work

rdd_of_text = rdd_of_tuples.map(lambda (x,y): x + ',' + str(y)).saveAsTextFile('/user/cloudera/rdd_of_text')

And you should get

one,1

Note that in this particular case you need to be aware of the types for concatenating (check the str(y)), else you would get the following exception

TypeError: cannot concatenate 'str' and 'int' objects
xmorera
  • 1,933
  • 3
  • 20
  • 35