in using watson personality insights API, i've already note some odd trends, including many scored at a mean value across dimensions (e.g. agreeableness with many around .27), making me thing it's imputing to something.
Upon review I've note a language misalign issue (i.e. if it thinks it's english, you could get weird results if it's, say spanish), which has lead me to ask, but not find answer to:
how does watson handle: 1) urls in the message (e.g. many twitter posts have urls) 2) repeat posts (many channels repeat post things many times) 3) special characters (many posts have a ton of random special characters)
My goal is to determine how much pre-processing I need to do to make watson most effective.