I am using DataFrame in pyspark.sql. Why is the output different in Ubuntu vs Mac?
I am using only 10 documents, so N=10. The formula I used is tf-idf = (1+log(tf))*log(N/df)
. So you can see actually Mac gives the correct output but Ubuntu (inside a VM) gives the wrong output.
My tf-idf column is a FloatType(). I calculated it using a udf function.
Ubuntu output:
Mac output: