I have a column called priority among other columns in a file and contains numbers For ex: 1, 2, 3, 4, 5, 6 etc. The file data is as follows
Department Strength Priority
--------------------------------
CS Good 10
CS Low 2
EC Good 10
EC Low 2
EC Nil 3
I want to select the priority 2 records using a query using SQL hive context as below
select * from
(
select testfile.*,row_number() over(partition by Department order by Priority asc) rn
from testfile
)ranked
where rn=1;
All the columns are defined as String in the spark code. I expect the code to select Priority 2 records because I have given the asc
in order by
clause. But is there a chance that the query selects priority other than 2 just because i didn't declare the priority column as int. Right now the query fetches the following data
CS Good 10
EC Good 10
However if I declare the Priority column as int
, the records will be shown correctly as below
CS Low 2
EC Low 2
I would like to understand this. There are several instances where I declared the number column as string however got the ordering correctly.