0

I have this code:

rdd.map(_.split("-")).filter(row => { ... })

when I do row.length on:

  1. This-is-a-test----on-split--

  2. This-is-a-test-------

the output is 9 and 4 respectively. It doesn't count the trailing delimited characters if it is empty. What is the workaround here if I want both outputs to be 10?

sophie
  • 991
  • 2
  • 15
  • 34

1 Answers1

2

You can accomplish what you want by passing -1 as limit parameter to split like this:

rdd.map(_.split("-", -1)).filter(row => { ... })

Btw, the expected result is 11, and not 10 (since if you want to keep empty tokens and your string ends with the delimiter, then it's interpreted as if there's an empty token after that delimiter). You can see this for more information.

Community
  • 1
  • 1
ale64bit
  • 6,232
  • 3
  • 24
  • 44