0

I have "sessions" Dataset in Spark:

Dataset<Row> sessions

This is the schema:

 |-- session_id: string (nullable = true)
 |-- screens: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- load_time: long (nullable = true)
 |    |    |-- name: string (nullable = true)
 |-- session_start: boolean (nullable = true)

I can filter records by "session_start" which works as expected:

Dataset<Row> startedSessions =  sessions.filter(col("session_start").equalTo("true"));

I want to filter sessions similar way but filter by field "screens.name" (nested field), but not only for single value, but to check if its value is in predefined ArrayList. In another words, lets say we have Araylist "desiredValues" and I need all records where "screens.name" is in "desiredValues" ArrayList.

I need this in Java please. Thank you in advance!

UPDATE: Thank you for your suggestions, I tried solution from How to use Column.isin in Java? suggested in comments and my statement looks like this now:

List<String> desiredValues = new ArrayList<String>(Arrays.asList("login", "logout"));
Dataset<Row> matchingSessions = sessions.filter(col("screens.name").isin(desiredValues.stream().toArray(String[]::new)));

however, now I'm getting this error:

org.apache.spark.sql.AnalysisException: cannot resolve '(`screens`.`name` IN ('login', 'logout'))' due to data type mismatch: Arguments must be same type;;
'Filter screens#149.name IN (login,logout)

even through both "screens.name" and elements of "desiredValues" are Strings.

UPDATE: After further research I discovered that Spark probably doesn't support filtering "array" field (in my case "screens.name") with array of desired values (in my case "desiredValues"). In another words, we may only have "array" field filtered by single value:

Dataset<Row> matchingSessions =  sessions.filter(array_contains(col("screens.name"), "login"));

or simple (not-nested) field filtered by array of values:

List<String> desiredValues = new ArrayList<String>(Arrays.asList("123", "456"));
Dataset<Row> matchingSessions = sessions.filter(col("session_id").isin(desiredValues.stream().toArray(String[]::new)));
Michal
  • 1
  • 1
  • explode the screens column and use solution from https://stackoverflow.com/questions/40468776/how-to-use-column-isin-in-java and you should be fine – Ramesh Maharjan Jun 03 '18 at 02:14
  • Thank you for you comments, solutions https://stackoverflow.com/questions/40468776/how-to-use-column-isin-in-java is exactly what I'm trying to achieve but after applying it I'm getting "data type mismatch" error (please see update above). – Michal Jun 03 '18 at 06:08

0 Answers0