I have used the solution suggested by Tomas Petricek in this question: Count unique in a Deedle Series
I have done a quick test python python and the solution above. I have slightly modified the function suggested by Tomas to sort the counts in the reverse order to match the output on the Series.count_values() from the Python function.
let unique s =
s |> Series.groupInto (fun _ v -> v) (fun _ g -> Stats.count g)
|> Series.sortBy (fun v -> -v)
When I execute the following code in F# Interactive
let rand = Random()
let l = [|1..1_000_000|] |> Array.map (fun _ -> rand.Next(1000))
|> Series.ofValues
|> unique
And using "#time" I get execution of around 1500ms in average (150ms for just creating the random Series).
I have also tested a similar code (in the Python console of PyCharm using Python 3.7)
import time
import pandas
start = time.time()
df = pandas.DataFrame(np.random.randint(0,1000,size=(1000000, 1)), columns=list('A'))
a = df['A'].value_counts()
print(a)
print("ms: %", 1000*(time.time()-start))
And I get, for the creation of the DataFrame + value_counts() around 40ms (half half for each step).
Any tip on how, at least, fasten the creation of the Series in F#? My code may not be the most efficient and I would like to know what I could do. I am trying to switch the mood in my team to switch some research from Python to F# and I do not want to hear from them that F# is way to slow. Thank you all!