Apache Spark MLLIB has HashingTF() function which takes tokenized words as input and converts those sets into fixed-length feature vectors.
As mentioned in documentation link spark mlib documentation
it is advisable to use power of two as the feature dimension.
The question is whether the exponent value is the number of terms in the input
If yes, Suppose If I consider more than 1000 text document as input which has more than 5000 terms , then the feature dimension become 2^5000
Whether my assumption is correct or is there any other way to find exponent value