Given:
Dataset:
+--------------------+
| count|
+--------------------+
| 1.0|
| 2.0|
| 3.0|
+--------------------+
Code:
String field = "count";
Dataset<Row> histogram = dataset
.groupBy(field)
.count()
.persist(StrorageLevel.MEMORY_ONLY_SER());
Column cnt = histogram.col("count"); // trying to get .count() result
Histogram schema:
root
|-- count: double (nullable = true) // input field `count`
|-- count: long (nullable = false) // .count() result
Exception:
org.apache.spark.sql.AnalysisException: Reference 'count' is ambiguous, could be: count#101, count#108L.;
Question:
While I understand, why this happens, I don't have any ideas about how to solve this problem. Dataset is created from a table in the database and may contain any number of columns with any names, including count
, avg
and other "reserved" words.
Any help appretiated.