I think it is do with the lazy evaluation. I remember an old thread on spark developer community that due to the optimisations on dataframe and dataset APIs, a count may not necessarily trigger the enitre dataframe/dataset evaluation. Hence the count may not be accurate.
However, if you do a df.rdd.count or ds.rdd.count or do a a cache or persist first on the dataset/dataframe and then do a count, it will evaluate the entire dataframe or dataset and the count will be accurate.
Looking at another thread How to force DataFrame evaluation in Spark please see the reply be Vince.Bdn which is in-line with my chain of thoughts.
If you want to further validate it, create a large dataframe and do the count before a persist and another one after the persist and look at the DAG, that should validate the same. in my case I went with a dataframe of 1 million records with 6 columns.