0

I am trying to setup spark with the new Microsoft.Spark library. The method DataFrame.PrintSchema works fine, however the method DataFrame.Take() gives an System.NotImplementedException. Allot of other methods also give this exception.

I took a look in the sources and that the 'Take' method calls the collect method and and it fails on the call to collectToPython.

SparkSession spark = SparkSession
    .Builder()
    .AppName(".NET Spark")
    .GetOrCreate();

DataFrame dataFrame = spark.Read().Json("people.json");
IEnumerable<Row> rows =  dataFrame.Take(1);

Is this just a Microsoft library that isn't finished yet? Or am I doing something wrong?

AeroX
  • 3,387
  • 2
  • 25
  • 39
Jan-Wiebe
  • 61
  • 2
  • 11

1 Answers1

1

Did you try the latest version released? I used v0.2.0 and the following works fine as expected:

var spark = SparkSession.Builder().GetOrCreate();
var df = spark.Read().Json("people.json");

IEnumerable<Row> rows = df.Take(1);
foreach (var row in rows)
{
    Console.WriteLine(row.Get("name"));
}
spark.Stop();
imback82
  • 41
  • 3