2

How can I convert a string to an array of characters, for example

"abcd" -> ["a","b","c","d"]

I know the split methd:

SELECT split("abcd","");

#["a","b","c","d",""]

is a bug for the last whitespace? or any other ideas?

fcce
  • 1,034
  • 1
  • 12
  • 24

2 Answers2

3

I don't know if it is a bug or that's how it works. As an alternative, you could use explode and collect_list to exclude blanks from a where clause

SELECT collect_list(l) 
FROM ( SELECT EXPLODE(split('abcd','') ) as l ) t 
WHERE t.l <> '';
Kaushik Nayak
  • 30,772
  • 5
  • 32
  • 45
  • Thanks for your answer, I know `collect_list` method, but the split's result is confusing. – fcce Mar 02 '18 at 08:03
3

This is not actually a bug. Hive split function simply calls the underlying Java String#split(String regexp, int limit) method with limit parameter set to -1, which causes trailing whitespace(s) to be returned.

I'm not going to dig into implementation details on why it's happening since there is already a brilliant answer that describes the issue. Note that str.split("", -1) will return different results depending on the version of Java you use.

A few alternatives:

  1. Use "(?!\A|\z)" as a separator regexp, e.g. split("abcd", "(?!\\A|\\z)"). This will make the regexp matcher skip zero-width matches at the start and at the end positions of the string.
  2. Create a custom UDF that uses either String#toCharArray(), or accepts limit as an argument of the UDF so you can use it as: SPLIT("", 0)
Sergey Khudyakov
  • 1,122
  • 1
  • 8
  • 15