0

I'm trying to read data from a text file into Python. The file consists of lines like this:

SAMPLE_0001 2000    57  1   103 51  0   NA

For ease of data management, I'd like to save that line as a list:

[SAMPLE_0001,2000,57,1,103,51,0,NA]

I wrote the following function to do that:

def line_breaker(line):
    words=[]
    if line[0]==' ':
        in_word=False
    else:
        in_word=True
    word=[]    
    for i in range(len(line)):
        if in_word==True and line[i]!=' ':
            word.append(line[i])
        elif in_word==True and line[i]==' ':
            in_word=False
            words.append(word)
            word=[]
        elif in_word==False and line[i]!=' ':
            in_word=True
            word.append(line[i])
        if i==len(line)-1 and line[i]!=' ':
            word.append(line[i])
            words.append(word)
    return words

Unfortunately, this doesn't work as intended. When I apply it to the example above, I get the whole line as one long string. On closer inspection, this was because the condition line[i]==' ' failed to trigger on the blank spaces. I guess I should replace ' ' with something else.

When I ask Python to print the 11th position in the example, it displays nothing. That's totally unhelpful. I then asked it to print the type of the 11th position in the example; I got <class 'str'>.

So what should I use to detect spaces?

Jongware
  • 22,200
  • 8
  • 54
  • 100
J.D.
  • 139
  • 4
  • 14
  • Why not just use `"SAMPLE_0001 2000 57 1 103 51 0 NA".split()` ? – Milan Velebit Feb 13 '20 at 08:49
  • Does this answer your question? [Split string on whitespace in Python](https://stackoverflow.com/questions/8113782/split-string-on-whitespace-in-python) – Amit Amola Feb 13 '20 at 08:53
  • Ask Python to print the character code (`ord`) instead - it might not be a space. "Whitespace" covers a lot of codes, so many that it needs a [function](https://docs.python.org/3/library/stdtypes.html?highlight=isspace#str.isspace) to check. – Jongware Feb 13 '20 at 08:56
  • 1
    Hi, thanks for the replies. Unfortunately, a simple use of .split(' ') isn't working. Looks like Python is failing to recognize the whitespaces as whitespaces at all. Going to look into the links. – J.D. Feb 13 '20 at 10:21
  • @usr2564301 How do I ask Python to print the character code (ord)? – J.D. Feb 13 '20 at 10:50
  • It's just a regular function. See [`ord` in the Official Documentation](https://docs.python.org/3/library/functions.html#ord) – Jongware Feb 13 '20 at 11:32

6 Answers6

1

You can use split, as usual – you'll just have to remember to not explicitly split on spaces alone, as in:

myNaiveSplit = text.split(' ')

because that will absolutely fail if, as in your case, there may be some other whitespace character between the words.

Instead, don't provide any argument at all. After all, the official documentation on split tells us so:

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator ...
(my emphasis)

and the 'whitespace' mentioned is everything which is considered "whitespace" by the function isspace (which is fully Unicode-compliant).

So all you need is

mySmartSplit = text.split()
Jongware
  • 22,200
  • 8
  • 54
  • 100
0

If you want to turn a string seperated by whitespaces into an array the best way (as some above has mentioned is the built in split(' ') function). But if you dont want to use that you could use isspace() and do it manually in a custom function like this:

def line_breaker():
    my_array = []
    string = list(input("Write Your string:\n"))
    last_whitespace = int()

    for index, element in enumerate(string):
        if(element.isspace()):
            my_array.append("".join(string[last_whitespace:index]))
            last_whitespace = index + 1

    print(my_array)
  • 1
    Couldn't get .split() to work; but replacing ```==' '``` with .```isspace()==True``` worked! Thanks – J.D. Feb 13 '20 at 10:57
  • Great to hear it working! But you don't need `isspace()==True` since `isspace()` returns a boolean so you are just asking if `True == True` or `False == True` when you can just use `True`. TLDR; Use `.isspace()` instead of `.isspace() == true` since `isspace()` returns a boolean. – Olav Ausland Feb 13 '20 at 21:07
-1

Why don't you use split?

line = "SAMPLE_0001 2000 57 1 103 51 0 NA"
print(line.split(' '))
['SAMPLE_0001', '2000', '57', '1', '103', '51', '0', 'NA']
Joan Lara
  • 1,362
  • 8
  • 15
-1

Solution:

line = "SAMPLE_0001 2000 57 1 103 51 0 NA"
line = line.split(" ")

You'll get what you want.

YamiOmar88
  • 1,336
  • 1
  • 8
  • 20
-1

Do it like this

delimiter = ' '
with open(file) as f: 
 for line in f.readlines():
  split_line = line.split(delimiter)
     # do your thing with the list of words
ganesh
  • 11
  • 3
-1

split() work perfectly.

strr= "SAMPLE_0001 2000 57 1 103 51 0 NA"
print(strr.split(' '))

split() changes sentences into python list according to your need. For example, strr.split(',') will split by comma.