0

I have the string :

'0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73-100,100-51,51,51,51-100,100-52,52,52,52,52,52,52,52,52-100,100-71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'

I basically want to feed a dataframe with a columns of strings like above to a 1D CNN for binary classification so I need to convert them to numpy arrays before training a model.

how can I convert these strings to a numpy array and save its features considering the character "-" between some numbers?

ali bakhtiari
  • 1,051
  • 4
  • 23
  • Does this answer your question? [How do I parse a string to a float or int?](https://stackoverflow.com/questions/379906/how-do-i-parse-a-string-to-a-float-or-int) – Imanpal Singh Jun 15 '20 at 17:22
  • 4
    How should the `-` be interpreted? For instance, what does `100-71` mean? – Balaji Ambresh Jun 15 '20 at 17:23
  • Related: [Python numpy: Convert string in to numpy array](https://stackoverflow.com/questions/11747125/python-numpy-convert-string-in-to-numpy-array) – wwii Jun 15 '20 at 17:25
  • 2
    @Balaji Ambresh each comma seperated number is a 15 min interval in a person's day starting from 00:00 to 23:59, the first number shows the time interval(00:00 to 00:15) and so on. '0' means we don't know what this person is doing in that time interval. but if we have a number ,for example, 71 this means the person is somewhere coded as a number(71 is church). if we have two numbers like 100-71, this means the person is at two different places in the given time period. – ali bakhtiari Jun 15 '20 at 17:30
  • 2
    @skrrrt unfortunately no. – ali bakhtiari Jun 15 '20 at 17:32
  • @alibakhtiari What's the output if input is `0,0,71-100,0,0,21` ? – Balaji Ambresh Jun 15 '20 at 17:39
  • 2
    @Balaji Ambresh there is no such input. all inputs have 96 comma seperated values. the output of my model will be gender classification 0 or 1. 0 shows female 1 shows male. I intend to train a model based on these strings that each have a gender. – ali bakhtiari Jun 15 '20 at 17:44
  • @alibakhtiari I just picked a short input to understand the output. How should the entity `100-71` be represented in numpy array? – Balaji Ambresh Jun 15 '20 at 17:45
  • 2
    @Balaji Ambresh I suppose a unique code. – ali bakhtiari Jun 15 '20 at 17:46

2 Answers2

1
import numpy as np

inp = "0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73-100,100-51,51,51,51-100,100-52,52,52,52,52,52,52,52,52-100,100-71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"
arr = np.array(inp.split(","))

If you want them to be as numbers, then use dtype=np.uint8 but you have to pre-process the numbers with - the way you want to (using replace(), et al.)

qedk
  • 468
  • 6
  • 18
1

Is this acceptable?

I'm using negative codes to ensure that they don't collide with any of your location codes. You get the idea:

locations = '0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,73-100,100-51,51,51,51-100,100-52,52,52,52,52,52,52,52,52-100,100-71,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'

import numpy as np
code = 0
mappings = {}
mapped_locations = []
for location in locations.split(','):
    if '-' in location:
        parts = [int(part) for part in location.split('-')]
        small, large = min(parts), max(parts)
        key = f'{small}-{large}'
        if key not in mappings:
            code -= 1
            mappings[key] = code
        mapped_locations.append(mappings[key])
    else:
        mapped_locations.append(int(location))
print(np.array(mapped_locations))
print()
print(mappings)

Output:

[ 0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
  0  0  0  0 -1 -2 51 51 -2 -3 52 52 52 52 52 52 52 -3 -4  0  0  0  0  0
  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]

{'73-100': -1, '51-100': -2, '52-100': -3, '71-100': -4}
Balaji Ambresh
  • 4,977
  • 2
  • 5
  • 17