Questions tagged [scalding]

Scalding is a scala DSL for Cascading, running on Hadoop.

Scalding is a scala DSL for Cascading, running on Hadoop.

See https://github.com/twitter/scalding

181 questions
98
votes
4 answers

Difference between reduce and foldLeft/fold in functional programming (particularly Scala and Scala APIs)?

Why do Scala and frameworks like Spark and Scalding have both reduce and foldLeft? So then what's the difference between reduce and fold?
samthebest
  • 30,803
  • 25
  • 102
  • 142
42
votes
5 answers

Cascading examples failed to compile?

In shell I typed gradle cleanJar in the Impatient/part1 directory. The output is below. The error is "class file for org.apache.hadoop.mapred.JobConf not found". Why did it fail to compile? :clean UP-TO-DATE :compileJava Download…
Treper
  • 3,539
  • 2
  • 26
  • 48
10
votes
1 answer

uncompress and read gzip file in scala

In Scala, how does one uncompress the text contained in file.gz so that it can be processed? I would be happy with either having the contents of the file stored in a variable, or saving it as a local file so that it can be read in by the program…
EthanP
  • 1,663
  • 3
  • 22
  • 27
9
votes
3 answers

Unresolved dependency: com.hadoop.gplcompression#hadoop-lzo;0.4.16 when "sbt update" in scalding

After getting code from git using clone https://github.com/twitter/scalding.git and doing ./sbt update I get: :::::::::::::::::::::::::::::::::::::::::::::: [warn] :: UNRESOLVED DEPENDENCIES :: [warn] …
Anton Ashanin
  • 1,817
  • 5
  • 30
  • 43
7
votes
2 answers

Why does a for comprehension expand to a `withFilter`

I'm working on a DSL for relational (SQL-like) operators. I have a Rep[Table] type with an .apply: ((Symbol, ...)) => Obj method that returns an object Obj which defines .flatMap: T1 => T2 and .map: T1 => T3 functions. As the type Rep[Table] does…
Pyetras
  • 1,492
  • 16
  • 21
7
votes
2 answers

Can I output a collection instead of a tuple in Scalding map method?

If you want to create a pipe with more than 22 fields from a smaller one in Scalding you are limited by Scala tuples, which cannot have more than 22 items. Is there a way to use collections instead of tuples? I imagine something like in the…
Calin-Andrei Burloiu
  • 1,481
  • 2
  • 13
  • 25
6
votes
3 answers

Write to multiple outputs by key Scalding Hadoop, one MapReduce Job

How can you write to multiple outputs dependent on the key using Scalding(/cascading) in a single Map Reduce Job. I could of course use .filter for all the possible keys, but that is a horrible hack, which will fire up many jobs.
samthebest
  • 30,803
  • 25
  • 102
  • 142
6
votes
1 answer

scala filename too long

I'm using scala 2.10 and gradle 1.11 My problem is that the compiled jar drop an error when I try to running in the hadoop cluster. I want to run on hadoop because I using scalding. The exception is: Exception in thread "main"…
6
votes
1 answer

Scalding: How to retain the other field, after a groupBy('field){.size}?

So my input data has two fields/columns: id1 & id2, and my code is the following: TextLine(args("input")) .read .mapTo('line->('id1,'id2)) {line: String => val fields = line.split("\t") …
jeremy.ting
  • 155
  • 1
  • 1
  • 7
5
votes
3 answers

Recommended way to access HBase using Scala

Now that SpyGlass is no longer being maintained, what is the recommended way to access HBase using Scala/Scalding? A similar question was asked in 2013, but most of the suggested links are either dead or to defunct projects. The only link that seems…
Ellen Spertus
  • 6,576
  • 9
  • 50
  • 101
5
votes
4 answers

(Scalding) groupBy foldLeft using the group by value in the fold

Have data like : pid recom-pid 1 1 1 2 1 3 2 1 2 2 2 4 2 5 Need to make it : pid, recommendations 1 2,3 2 1,4,5 Meaning ignore self from the 2nd column, and make the rest in to a comma separated string. Its tab separated…
tgkprog
  • 4,493
  • 4
  • 41
  • 70
4
votes
0 answers

How can I sort elements of a TypedPipe in Scalding?

I have not been able to find a way to sort elements of a TypedPipe in Scalding (when not performing a group operation). Here are the relevant parts of my program (replacing irrelevant parts with ellipses): case class ReduceOutput(val slug :…
Ellen Spertus
  • 6,576
  • 9
  • 50
  • 101
4
votes
2 answers

How to visualize steps of a scalding job

My scalding job is translated into 9 map reduce jobs (m/r jobs). It's not easy for me to understand which part of code each m/r job represents. Is there anything that could help me understand my job better? //this has been copy&pasted from our…
Oleksii
  • 1,101
  • 7
  • 12
4
votes
0 answers

Scalding NPE only when assigning pipe to val

I'm new to Scala and Scalding, and in working on my first Job I'm encountering a NullPointerException when assigning a pipe to a val. The exact same job that just chains to a .write() without the intermediate variable completes as expected. What…
jpk
  • 281
  • 2
  • 11
4
votes
1 answer

how to perform an operation one time only at the end of a scalding job?

I read in scalding groupAll docs: /** * Group all tuples down to one reducer. * (due to cascading limitation). * This is probably only useful just before setting a tail such as Database * tail, so that only one reducer talks to…
Jas
  • 14,493
  • 27
  • 97
  • 148
1
2 3
12 13