0

EDIT: Thank you for so many responses and apologies for ambiguous question, and poorly constructed example. I didn't mean to split the word by the uppercase character. I literally mean to split the string after a specific character like 'X', 'r', or 'o'.

I'm trying to split a long string of repeating words like "FooXXBarBazXBar ..." into a list of words ['Fo', 'o', 'X', 'X', 'Bar', 'Baz', 'X', 'Bar' ...].

In c I would create a new array of characters and copy characters one by one, inserting a delimiter after a specific character, to later split on the delimiter like this:

{
    char *newString = malloc(sizeof(char)*200);
    int offset=0;

    //copy characters from old string to the new one, inserting ';' after a specific character

    for (int i = 0, n = strlen(string); i < n; i++)
    {
        newString[i+offset] = string[i];
        if ((string[i] == 'o') || (string[i] == 'r') || (string[i] == 'z') || (string[i] == 'X'))
        {
            offset += 1;
            newString[i+offset] = ';';
        }
    }

    //split the string into an array of strings on the delimiter ';'
    split_string(newString);
}

I'm still a beginner, so it's probably also not that great, but it works.

I'm trying to work around this in python, but cannot come up with a good approach.

How would you go about it? Thanks :)

Nishijama
  • 11
  • 2
  • I think you must better specify the problem on how the split will work, why for instance is `Foo` split out and not `FooX` or `FooXX` for that matter. I think you want to split out repeating patterns of letters (starting with the longest patterns), So for `Foo` to be split out somewhere in the string there should be another `Foo` pattern, etc. – Bruno Vermeulen Jul 08 '21 at 07:18
  • Also why you use the key word `Python` while your code snippet is in `C`? – Bruno Vermeulen Jul 08 '21 at 07:19
  • @BrunoVermeulen maybe a result word must end with `[aeiou]`? – Lei Yang Jul 08 '21 at 07:22
  • Hello, thank you for the answer. My snippet is in C, because there I know how to implement it. I used the Python keyword, because I'm looking for the solution of this problem in Python. I made a mistake in my example showing input "Foo" and output ["Foo"]. In what I'm looking for it would give an output of ["Fo", "o"] I want to define in my code a specific character and split the string after that character. – Nishijama Jul 08 '21 at 08:28

4 Answers4

0

If i understand correctly, and what you want to do is to split each word on an uppercase letter ("HelloWorldPython" -> ["Hello", "World", "Python"]

I would use regex:

import re
text = "FooXXBarBazXBar"
re.findall(r'[A-Z][^A-Z]*', text) # Uppercase letter followed by 0 or more non-uppercased character
['Foo', 'X', 'X', 'Bar', 'Baz', 'X', 'Bar']
Ron Serruya
  • 3,988
  • 1
  • 16
  • 26
0

Your quesiton is kind of ambiguous. It's not clear at which characters you wanna split the string but if these are ['o','r','X','z'] then the following code will do:

string = input()
word = ''
words_list = []
for char in string: # iterate through the string
    word += char # add the character to the word
    if(char in ['o','r','X','z']): # check if the character at current iteration is one of the characters in the list
        words_list.append(word) # add the word formed by now to the list
        word = '' # re-initialize the word variable to an empty string so as to start the next word
print(words_list)

This would output:

['Fo', 'o', 'X', 'X', 'Bar', 'Baz', 'X', 'Bar', 'xxX']
Warning: As you have specified 'o' as a splitting cahracter it will never give you 'Foo' in order to get 'Foo' instead of 'Fo' in the list you will have to add more code within the loop.
Ajay Singh Rana
  • 573
  • 4
  • 19
  • 1
    Awesome, thank you! This is exactly what I was looking for! I made a mistake in my example choosing word "foo" and not noticing it has two 'o's in it – Nishijama Jul 08 '21 at 08:39
  • Welcome ! @Nishijama there's nothing wrong with mistakes infact these are beautiful. – Ajay Singh Rana Jul 08 '21 at 17:02
0

Splitting a string into parts starting with an uppercase.
like splitting "ThisIsALineAsAnExample" to ['This','Is','A','Line','As','An','Example']

if this is what you want to do then this may help

i=0
j=1
final=[]
s="ThisIsALineAsAnExample"
while(i<len(s)):
    if(s[i].isupper()):
        new_str=s[i]
        j=i+1
        while(j<len(s) and s[j].islower()):
            new_str=new_str+s[j]
            j=j+1
        print(new_str)
    i=j
    final.append(new_str)

print(final)
0

The example input and the ouput suggests that you want to split every word that starts with an uppercase letter.

For Python, a solution with this approach can be found here.

If you want to apply it to your solution in C too,you can check if a char is uppercase by:

if(string[i] >= 'A' && string[i] <= 'Z') {
    offset += 1;
    newString[i+offset] = ';';
}
Darkhorrow
  • 37
  • 1
  • 9