Spark don't really require numeric id, it just needs to bee some unique value, but for implementation they picked Int.
You can do simple back and forth transformation for userId:
case class MyRating(userId: String, product: Int, rating: Double)
val data: RDD[MyRating] = ???
// Assign unique Long id for each userId
val userIdToInt: RDD[(String, Long)] =
data.map(_.userId).distinct().zipWithUniqueId()
// Reverse mapping from generated id to original
val reverseMapping: RDD[(Long, String)]
userIdToInt map { case (l, r) => (r, l) }
// Depends on data size, maybe too big to keep
// on single machine
val map: Map[String, Int] =
userIdToInt.collect().toMap.mapValues(_.toInt)
// Transform to MLLib rating
val rating: RDD[Rating] = data.map { r =>
Rating(userIdToInt.lookup(r.userId).head.toInt, r.product, r.rating)
// -- or
Rating(map(r.userId), r.product, r.rating)
}
// ... train model
// ... get back to MyRating userId from Int
val someUserId: String = reverseMapping.lookup(123).head
You can also try 'data.zipWithUniqueId()' but I'm not sure that in this case .toInt will be safe transformation even if dataset size is small.