-1

I started to learn Scala on spark and try to do ETL. I try to filter data frames whose string should be split into 4 columns by whitespace family.

This is what I have tried

df.where(split(df("item"), "\\s+").length == 4).show()

And it shows the error value length is not a member of org.apache.spark.sql.Column

and I have looked the documentation and find that it returns the Column class so it definitely doesn't have the length attribute.

And I am stuck on it and doesn't know how to solve it, I have googled it but only find this.

What I want is to filter based on the split length and I also try to lookup what split return but I don't know which split function is used.

So could you please tell me

1.What the split function it used?

2.How to filter rows with split string's length is 4?

Thanks

Ram Koti
  • 2,203
  • 7
  • 26
  • 36
Jason
  • 1,573
  • 3
  • 18
  • 46

1 Answers1

2

You can use size function for that as

import org.apache.spark.sql.functions._

df.where(size(split($"item" ,"\\s+")) === 4)
koiralo
  • 22,594
  • 6
  • 51
  • 72