AnalysisException: Reference 'count' is ambiguous

Question

Given:

Dataset:

+--------------------+
|               count|
+--------------------+
|                 1.0|
|                 2.0|
|                 3.0|
+--------------------+

Code:

String field = "count";    

Dataset<Row> histogram = dataset
    .groupBy(field)
    .count()
    .persist(StrorageLevel.MEMORY_ONLY_SER());

Column cnt = histogram.col("count"); // trying to get .count() result

Histogram schema:

root
 |-- count: double (nullable = true) // input field `count`
 |-- count: long (nullable = false)  // .count() result

Exception:

org.apache.spark.sql.AnalysisException: Reference 'count' is ambiguous, could be: count#101, count#108L.;

Question:

While I understand, why this happens, I don't have any ideas about how to solve this problem. Dataset is created from a table in the database and may contain any number of columns with any names, including count, avg and other "reserved" words.

Any help appretiated.

By changing the column name from "count" to something else? :-P — pri, Jan 19 '18 at 11:28
@AlexanderRomanov, how about adding some prefix to original columns? for example: `df = df.toDf(*['some_prefix_' + c for c in df.columns])` (please note the this is a pyspark code) — fuggy_yama, Sep 11 '20 at 14:45

score 2 · Answer 1 · answered Jan 19 '18 at 11:50

2

dataset.createOrReplaceTempView("V1");
dataset = spark.sql("select count as count_O from v1");
Dataset<Row>  histogram = dataset.groupBy("count_O").count().persist(StrorageLevel.MEMORY_ONLY_SER());
Column cnt = histogram.col("count");

answered Jan 19 '18 at 11:50

Srinu Babu

422
2
5
16

Thank's for your answer. The problem is that our system is working with thousands of different tables. And it's somewhat difficult to modify our select-queries in that way. – Alex Romanov Jan 19 '18 at 11:55

AnalysisException: Reference 'count' is ambiguous

Given:

Question:

1 Answers1