I have "sessions" Dataset in Spark:
Dataset<Row> sessions
This is the schema:
|-- session_id: string (nullable = true)
|-- screens: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- load_time: long (nullable = true)
| | |-- name: string (nullable = true)
|-- session_start: boolean (nullable = true)
I can filter records by "session_start" which works as expected:
Dataset<Row> startedSessions = sessions.filter(col("session_start").equalTo("true"));
I want to filter sessions similar way but filter by field "screens.name" (nested field), but not only for single value, but to check if its value is in predefined ArrayList
.
In another words, lets say we have Araylist "desiredValues" and I need all records where "screens.name" is in "desiredValues" ArrayList.
I need this in Java please. Thank you in advance!
UPDATE: Thank you for your suggestions, I tried solution from How to use Column.isin in Java? suggested in comments and my statement looks like this now:
List<String> desiredValues = new ArrayList<String>(Arrays.asList("login", "logout"));
Dataset<Row> matchingSessions = sessions.filter(col("screens.name").isin(desiredValues.stream().toArray(String[]::new)));
however, now I'm getting this error:
org.apache.spark.sql.AnalysisException: cannot resolve '(`screens`.`name` IN ('login', 'logout'))' due to data type mismatch: Arguments must be same type;;
'Filter screens#149.name IN (login,logout)
even through both "screens.name" and elements of "desiredValues" are Strings.
UPDATE: After further research I discovered that Spark probably doesn't support filtering "array" field (in my case "screens.name") with array of desired values (in my case "desiredValues"). In another words, we may only have "array" field filtered by single value:
Dataset<Row> matchingSessions = sessions.filter(array_contains(col("screens.name"), "login"));
or simple (not-nested) field filtered by array of values:
List<String> desiredValues = new ArrayList<String>(Arrays.asList("123", "456"));
Dataset<Row> matchingSessions = sessions.filter(col("session_id").isin(desiredValues.stream().toArray(String[]::new)));