2

Given:

Dataset:

+--------------------+
|               count|
+--------------------+
|                 1.0|
|                 2.0|
|                 3.0|
+--------------------+

Code:

String field = "count";    

Dataset<Row> histogram = dataset
    .groupBy(field)
    .count()
    .persist(StrorageLevel.MEMORY_ONLY_SER());

Column cnt = histogram.col("count"); // trying to get .count() result

Histogram schema:

root
 |-- count: double (nullable = true) // input field `count`
 |-- count: long (nullable = false)  // .count() result

Exception:

org.apache.spark.sql.AnalysisException: Reference 'count' is ambiguous, could be: count#101, count#108L.;

Question:

While I understand, why this happens, I don't have any ideas about how to solve this problem. Dataset is created from a table in the database and may contain any number of columns with any names, including count, avg and other "reserved" words.

Any help appretiated.

Alex Romanov
  • 11,453
  • 6
  • 48
  • 51

1 Answers1

2
dataset.createOrReplaceTempView("V1");
dataset = spark.sql("select count as count_O from v1");
Dataset<Row>  histogram = dataset.groupBy("count_O").count().persist(StrorageLevel.MEMORY_ONLY_SER());
Column cnt = histogram.col("count");
Srinu Babu
  • 422
  • 2
  • 5
  • 16
  • Thank's for your answer. The problem is that our system is working with thousands of different tables. And it's somewhat difficult to modify our select-queries in that way. – Alex Romanov Jan 19 '18 at 11:55