0

How to apply scala uaparser to a column of a dataframe. Each row in the column in the dataframe is of the form -

Mozilla/5.0 (iPhone; CPU iPhone OS 5_1_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B206 Safari/7534.48.3"

I am trying to something of the form -

def getTrnUiEvent(hiveDf:org.apache.spark.sql.DataFrame): Unit = {
val trnUiEventDf = hiveDf
  .withColumn("application_browser_user_agent", getUAFamily(hiveDf("application_browser_user_agent")))}


 val getUAFamily = udf((ua_string:org.apache.spark.sql.DataFrame) => {
Parser.get.parse(ua_string.toString()).userAgent.family})

I am receiving an error for the above. I have also tried other ways to do the above but am ending up the with same result. The thing I cant get my head around is how each row of the dataframe column can be processed by the uaparser. Each row of hiveDf("application_browser_user_agent") looks like the string example pasted above.

The links I have looked at - Applying function to Spark Dataframe Column

Do I convert this to an RDD first and then process each row of the RDD using the uaparser?

Link for uaparser - https://github.com/ua-parser/uap-scala

error message -

/Users/pojha/github/Bacon/scala/bacon/src/main/scala/baconParallel.scala:62: 


No TypeTag available for String


[error]   val getUAFamily = udf((ua_string:org.apache.spark.sql.DataFrame) => {
Community
  • 1
  • 1
preitam ojha
  • 239
  • 1
  • 2
  • 7
  • 1
    add the error message in the question is helpful – Rockie Yang Jul 07 '16 at 07:53
  • Added the error message as requested. – preitam ojha Jul 07 '16 at 07:59
  • 1
    I guess you are trying to get the UserAgent from column application_browser_user_agent. If so, then the udf should be String => String, while you are trying to from DataFrame to ? – Rockie Yang Jul 07 '16 at 09:08
  • That was my initial definition for the udf. ` val getUAFamily = udf[org.uaparser.scala.Client,String]((ua_string:String) => { Parser.get.parse(ua_string) })` `scala:62: No TypeTag available for String [error] val getUAFamily = udf[org.uaparser.scala.Client,String]((ua_string:String) => {` – preitam ojha Jul 07 '16 at 14:48
  • Your initial definition seems to work for me. – Jonas Aug 22 '16 at 15:59
  • Have you solved this? I am trying to do the same thing. I tried this - `val getUAFamily = udf[org.uaparser.scala.Client,String]((ua_string) => { Parser.default.parse(ua_string) })`, `val newdf = df.withColumn("new_user_agent", getUAFamily('user_agent))` . But I am getting something like this - [[Mobile Safari,8...]]. I am trying to get the device, browser and OS actually. – ds_user May 24 '18 at 00:34

0 Answers0