-1

I have an input string, each element has a number and character which I want to access each element number and character separately as the following:

1s-2r,3d*3 # this is the line in the input file: # this stars means repeated three time 

So I want to make an array includes only numbers as:

number_only=[1,2,3,3,3] # numpy 
s=[s,r,d,d,d] # another array string characters only 

But I got the following erros "TypeError: can't multiply sequence by non-int of type 'str'".. I know that this should be a intger but I do not know how to do that, attached is the trial code

import numpy as np
with open('dataa.dat', 'r') as f:
     input_data = f.readlines()
     input_data = [(d+' ')[:d.find('#')].rstrip() for d in input_data]
x   =          input_data[0].split('-')
y =          []

for elt in x:
    if "*" in elt:
        n, mult        = elt.split("*")
        y        = y + [(n)] * (mult)
    else:
        y+=[ii for ii in elt.split(',')]
number_only        =          np.array(y)
#s
msci
  • 89
  • 6

3 Answers3

1

This returns numbers from a string:

only_digits = ''.join(i for i in string if i.isdigit())
  • Thank you for your concern, However, how could I include repeat :: how to make this 1s-2r,3d*3 as 1s-2r,3d,3d,3d Then [1,2,3,3,3] and [s,r,d,d,d] Thank you again !! – msci Dec 13 '21 at 17:10
  • No problem, let me know if it's working! – Sam Dec 13 '21 at 18:27
0

Simple way to do this is with regex:

import re

string = "1s-2r,3d*3"
numbers = re.findall(r"[0-9]", string)
letters = re.findall(r"[a-zA-Z]", string)

Then to convert numbers from str to int:

numbers = [int(i) for i in numbers]

Edit:

This should do it

def parse_string_to_numbers_letters(string):
    string_parts = re.split(r",|-", str(test))
    aggregated_string = ""
    for string in string_parts:
        if re.search("\*", string):
            to_be_multipled = string.split("*")[0]
            multiplier = string.split("*")[1]
            string = to_be_multipled * int(multiplier)
        aggregated_string +=string
    numbers = re.findall(r"[0-9]", aggregated_string)
    letters = re.findall(r"[a-zA-Z]", aggregated_string)
    return numbers, letters
Sam
  • 773
  • 4
  • 13
  • Thank you for your comment, however, if I do that, I will not able to include star to repeat values for both numbers and string – msci Dec 13 '21 at 17:06
  • Ah I misunderstood your question, does 1s-2r,3d*3 evaluate to " 1s-2r,3d1s-2r,3d1s-2r,3d" ? – Sam Dec 13 '21 at 17:08
  • Thank you so much again, It means/should be as 1s-2r,3d,3d,3d" and final answer should be for array of numbers only =[1 2 3 3 3] string=[s r d d d] – msci Dec 13 '21 at 17:20
  • No problem - is the format the same for every line? Try posting the first 10 lines so we can see what's going on and to check the format is relatively stable. – Sam Dec 13 '21 at 17:22
  • not sure if I understand what you said or not, but the letters or numbers might be changed but the star is always the same to repeat – msci Dec 13 '21 at 17:23
  • is there always the comma to separate the values that are to be repeated and those that aren't? e.g: 1s-2r,3d*3 1s-3s,5e*3 2e-4a,7f*3 etc. – Sam Dec 13 '21 at 17:24
  • I wish, i could do something generic, but it will be the same format except for letters, and numbers (will change) – msci Dec 13 '21 at 17:24
  • yes, comma and dash always exist – msci Dec 13 '21 at 17:24
  • Hi again, Example of the input would be: 1s,2e*3-2q*2,1u Should be=[1 2 2 2 2 2 1] and string array=[s e e e q q u] – msci Dec 13 '21 at 17:29
  • ah sorry I didn't realise there were multiple "*" – Sam Dec 13 '21 at 17:39
  • No worries, * to repeat value based on the number after it for example: 3r*2 means=3r 3r 3r so will be [3 3 3], [r r r] Sorry if I did not explain what i want clearly, I hope we could find a way Thanks, – msci Dec 13 '21 at 17:42
  • I think it's solved now (edited). – Sam Dec 13 '21 at 17:58
  • it seems works perfectly even if the number element are more than one, I will check more cases and accept your answer, thank you so much! – msci Dec 13 '21 at 18:04
0

You might try this:

import re

def split_string(pat):
    numbers = []
    letters = []

    for s in re.split(r"[,-]", pat):
        count = 1
        if len(s) > 2:
            assert s[2] == '*'
            count = int(s[3:])
        numbers += [int(s[0])] * (count)
        letters += [s[1]] * (count)

    return numbers, letters

def main():
    # The two examples from the question and the comments
    numbers, letters = split_string("1s-2r,3d*3")
    assert numbers == [1,2,3,3,3]
    assert letters == ['s','r','d','d','d']

    numbers, letters = split_string("1s,2e*3-2q*2,1u")
    assert numbers == [1,2,2,2,2,2,1]
    assert letters == ['s', 'e', 'e', 'e', 'q', 'q', 'u']


if __name__ == '__main__':
    main()
Andreas Florath
  • 4,418
  • 22
  • 32
  • Could you please explain what is the meaning of the part of if__name__=='__main__' – msci Dec 13 '21 at 17:51
  • it seems works perfectly, however, if the number is 11 or 20 or two elements, it returns to the following error "in split_string assert s[2] == '*' AssertionError" So Is there a way to handle that ?? – msci Dec 13 '21 at 17:58
  • For the __main__: please check: https://stackoverflow.com/questions/419163/what-does-if-name-main-do – Andreas Florath Dec 13 '21 at 18:33
  • The questions did only speak about one digit numbers. Do you expect any (integer) number as a first element? – Andreas Florath Dec 13 '21 at 18:35