By logging the split String to see where the issues are :
يَا
أَيُّهَا
الَّذِينَ
آمَنُوا
لَا
تَقْرَبُوا
الصَّلَاةَ
وَأَنْتُمْ
سُكَارَىٰ
حَتَّىٰ
تَعْلَمُوا
مَا
تَقُولُونَ
وَلَا
جُنُبًا
إِلَّا
عَابِرِي
سَبِيلٍ
حَتَّىٰ
تَغْتَسِلُوا
ۚ >>>>>>>>>>>>>>>>>>>>> Problem here
وَإِنْ
كُنْتُمْ
مَرْضَىٰ
أَوْ
عَلَىٰ
سَفَرٍ
أَوْ
جَاءَ
أَحَدٌ
مِنْكُمْ
مِنَ
الْغَائِطِ
أَوْ
لَامَسْتُمُ
النِّسَاءَ
فَلَمْ
تَجِدُوا
مَاءً
فَتَيَمَّمُوا
صَعِيدًا
طَيِّبًا
فَامْسَحُوا
بِوُجُوهِكُمْ
وَأَيْدِيكُمْ
ۗ >>>>>>>>>>>>>>>>>>>>> Problem here
إِنَّ
اللَّهَ
كَانَ
عَفُوًّا
غَفُورًا
So, apparently the problem is on the upper diacritics (or markers for accurately speaking) like ۚ or ۗ because they're not considered valid characters.
I believe that the Kotlin version is more accurate than the Swift one, because what you need is:
Separate this String on SPACE as a delimiter (FULL STOP)
What Swift tends to do is that it doesn't recognize the upper diacritics/markers, i.e. it considers them nothing, and doesn't count them when the string is split. Probably there is another Swift function that can detect that, not sure about that as this is not a part of your question.
And as you have a couple of those markers; therefore the Kotlin version count more than the Swift one by two (i.e. 51 instead of 49).
So, the question would be: How to remove the upper diacritics/markers from a string before splitting it?
Thanks to this answer that lists those type of markers; and in Kotlin you can use the String replace()
method to replace them with nothing:
Here is a snippet to fix your example:
var str = getString(R.string.valueHere)
str = str
.replace("\u0615", "") //ARABIC SMALL HIGH TAH
.replace("\u0616", "") //ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH
.replace("\u0617", "") //ARABIC SMALL HIGH ZAIN
.replace("\u0618", "") //ARABIC SMALL FATHA
.replace("\u0619", "") //ARABIC SMALL DAMMA
.replace("\u061A", "") //ARABIC SMALL KASRA
.replace("\u06D6", "") //ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA
.replace("\u06D7", "") //ARABIC SMALL HIGH LIGATURE QAF WITH LAM WITH ALEF MAKSURA
.replace("\u06D8", "") //ARABIC SMALL HIGH MEEM INITIAL FORM
.replace("\u06D9", "") //ARABIC SMALL HIGH LAM ALEF
.replace("\u06DA", "") //ARABIC SMALL HIGH JEEM
.replace("\u06DB", "") //ARABIC SMALL HIGH THREE DOTS
.replace("\u06DC", "") //ARABIC SMALL HIGH SEEN
.replace("\u06DD", "") //ARABIC END OF AYAH
.replace("\u06DE", "") //ARABIC START OF RUB EL HIZB
.replace("\u06DF", "") //ARABIC SMALL HIGH ROUNDED ZERO
.replace("\u06E0", "") //ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO
.replace("\u06E1", "") //ARABIC SMALL HIGH DOTLESS HEAD OF KHAH
.replace("\u06E2", "") //ARABIC SMALL HIGH MEEM ISOLATED FORM
.replace("\u06E3", "") //ARABIC SMALL LOW SEEN
.replace("\u06E4", "") //ARABIC SMALL HIGH MADDA
.replace("\u06E5", "") //ARABIC SMALL WAW
.replace("\u06E6", "") //ARABIC SMALL YEH
.replace("\u06E7", "") //ARABIC SMALL HIGH YEH
.replace("\u06E8", "") //ARABIC SMALL HIGH NOON
.replace("\u06E9", "") //ARABIC PLACE OF SAJDAH
.replace("\u06EA", "") //ARABIC EMPTY CENTRE LOW STOP
.replace("\u06EB", "") //ARABIC EMPTY CENTRE HIGH STOP
.replace("\u06EC", "") //ARABIC ROUNDED HIGH STOP WITH FILLED CENTRE
.replace("\u06ED", "") //ARABIC SMALL LOW MEEM
val split = str.split(" ")
val count = str.split(" ").count {
it.isNotBlank()
}
Log.d("count is ", "$count")
This is the test verification result on a Kotlin compiler
UPDATE:
I have a long string that I need to color range inside it with a different color inside a textView , so split it with spaces get needed words by lower and upper word index, then join them in one string to color their range inside the long string , the above answer did give 49 but it removed important characters mentioned with replace , so any try to tweak your code to consider this ?
So, if you'd follow the top approach, you just need to remove the blanks from the split String, for this you can use the filter{}
reduction after replacing all the markers with blanks
fun getColorRange(input: String, wordFrom: Int, wordTo: Int): Range<Int> {
val text = input
.replace("\u0615", "") //ARABIC SMALL HIGH TAH
.replace("\u0616", "") //ARABIC SMALL HIGH LIGATURE ALEF WITH LAM WITH YEH
.replace("\u0617", "") //ARABIC SMALL HIGH ZAIN
.replace("\u0618", "") //ARABIC SMALL FATHA
.replace("\u0619", "") //ARABIC SMALL DAMMA
.replace("\u061A", "") //ARABIC SMALL KASRA
.replace("\u06D6", "") //ARABIC SMALL HIGH LIGATURE SAD WITH LAM WITH ALEF MAKSURA
.replace("\u06D7", "") //ARABIC SMALL HIGH LIGATURE QAF WITH LAM WITH ALEF MAKSURA
.replace("\u06D8", "") //ARABIC SMALL HIGH MEEM INITIAL FORM
.replace("\u06D9", "") //ARABIC SMALL HIGH LAM ALEF
.replace("\u06DA", "") //ARABIC SMALL HIGH JEEM
.replace("\u06DB", "") //ARABIC SMALL HIGH THREE DOTS
.replace("\u06DC", "") //ARABIC SMALL HIGH SEEN
.replace("\u06DD", "") //ARABIC END OF AYAH
.replace("\u06DE", "") //ARABIC START OF RUB EL HIZB
.replace("\u06DF", "") //ARABIC SMALL HIGH ROUNDED ZERO
.replace("\u06E0", "") //ARABIC SMALL HIGH UPRIGHT RECTANGULAR ZERO
.replace("\u06E1", "") //ARABIC SMALL HIGH DOTLESS HEAD OF KHAH
.replace("\u06E2", "") //ARABIC SMALL HIGH MEEM ISOLATED FORM
.replace("\u06E3", "") //ARABIC SMALL LOW SEEN
.replace("\u06E4", "") //ARABIC SMALL HIGH MADDA
.replace("\u06E5", "") //ARABIC SMALL WAW
.replace("\u06E6", "") //ARABIC SMALL YEH
.replace("\u06E7", "") //ARABIC SMALL HIGH YEH
.replace("\u06E8", "") //ARABIC SMALL HIGH NOON
.replace("\u06E9", "") //ARABIC PLACE OF SAJDAH
.replace("\u06EA", "") //ARABIC EMPTY CENTRE LOW STOP
.replace("\u06EB", "") //ARABIC EMPTY CENTRE HIGH STOP
.replace("\u06EC", "") //ARABIC ROUNDED HIGH STOP WITH FILLED CENTRE
.replace("\u06ED", "") //ARABIC SMALL LOW MEEM
val all = text.split(" ").filter { it.isNotBlank() } // Remove the blanks (i.e. the markers)
val sub = (wordFrom..wordTo).map { all[it] }.joinToString(" ")
Log.d("LOG_TAG", "getColorRange: $sub")
val range = text.indexOf(sub[0], wordFrom)
return Range<Int>(range, range + sub.length)
}
Sample usage:
getColorRange(str, 18, 22)
// Output:
// حَتَّىٰ تَغْتَسِلُوا وَإِنْ كُنْتُمْ مَرْضَىٰ
getColorRange(str, 0, 48) // Should return the entire string as this is the total number of words
// Output:
// يَا أَيُّهَا الَّذِينَ آمَنُوا لَا تَقْرَبُوا الصَّلَاةَ وَأَنْتُمْ سُكَارَىٰ حَتَّىٰ تَعْلَمُوا مَا تَقُولُونَ وَلَا جُنُبًا إِلَّا عَابِرِي سَبِيلٍ حَتَّىٰ تَغْتَسِلُوا وَإِنْ كُنْتُمْ مَرْضَىٰ أَوْ عَلَىٰ سَفَرٍ أَوْ جَاءَ أَحَدٌ مِنْكُمْ مِنَ الْغَائِطِ أَوْ لَامَسْتُمُ النِّسَاءَ فَلَمْ تَجِدُوا مَاءً فَتَيَمَّمُوا صَعِيدًا طَيِّبًا فَامْسَحُوا بِوُجُوهِكُمْ وَأَيْدِيكُمْ إِنَّ اللَّهَ كَانَ عَفُوًّا غَفُورًا
Also notice that there is an issue in the range
value, as the sub
is a list, not a String, so the below is wrong
val range = text.indexOf(sub)
Instead, you need to get the index of the first item in the sub
, and starting from the wordFrom
not from the beginning of the string:
val range = text.indexOf(sub[0], wordFrom)