Extracting data from a text file, whitespace, splitting letter and number and colon

Question

I'm trying to create a program that reads a txt file that has over 1000 lines of the format below and stores the data in two separate two dimensional arrays:

b14 b15 b12 y4:y11 r7 y1 b2
r15 y13 y12 b14:g9 r2 b8 b7

The file stores the results of a game where there are two players and they both choose four tokens out of a bag. An example token as can be seen above is 'b15' which means it is the colour blue and it has the number 15 on it. The colon signifies that the tokens thereafter are for the 2nd player.

Each line is a game. I need to store the colour and number of each token into two dimensional arrays with 4 rows and 2 columns, I have one for each player e.g.

player1[0][0] = 'b'
player1[0][1] = 14
player1[1][0] = 'b'
player1[1][1] = 15

This stores the first two tokens for player 1, after I've stored the rest of the tokens for this player and the 2nd player in a separate two dimensional array for a single game (single line in text file), I'll be processing the data then overwriting the arrays again for the next line (game) in the text file.

My main question is how do I do the following:

Split the letter and number so I can store them in the separate array positions
Recognise a white space meaning a new token
Recognise that the colon means that player's tokens have all been chosen and it's player 2 next.

Thanks for reading and I'm happy to answer any questions to clarify further.

[`str.split([sep[, maxsplit]])`](https://docs.python.org/2/library/stdtypes.html#str.split) — Phu Ngo, Oct 22 '16 at 13:36
Start learning and practising [regular expressions](https://docs.python.org/3/library/re.html#module-re) - it might take a while but for pattern matching in strings it is useful. You might be tempted to use it all the time [but it is not for everything](http://stackoverflow.com/a/1732454/2823755). If you search for online regex testers , there are a few good ones that use the same flavor as Python. — wwii, Oct 22 '16 at 15:34
Then again, ```str.split()``` will split on whitespace by default and on a colon if specified - and if the color of a token is **always** one character, you can extract with a slice ```color, number = token[:1], token[1:]``` — wwii, Oct 22 '16 at 15:42

score 1 · Accepted Answer · edited May 23 '17 at 10:27

1

One you have read your moves in from the text file, you can use the split function and list slicing (Explain Python's slice notation) to process them.

>>> mystring = 'b14 b15 b12 y4:y11 r7 y1 b2'

Split at the colon to get player 1 / payer 2 moves:

>>> player1, player2 = mystring.split(':')

For each player, split at the spaces to get the moves:

>>> player1_moves = player1.split(' ')
>>> player1_moves
['b14', 'b15', 'b12', 'y4']

If you know that the first part of the move will always be exactly one letter, you can 'slice' off the first part of the string:

>>> player1_moves[0][:1]
'b'
>>> player1_moves[0][1:]
'14'

edited May 23 '17 at 10:27

Community

1
1

answered Oct 22 '16 at 14:01

user3468054

610
4
11

For the 2nd player, it's adding the \n onto the end of the resulting array, is there anyway to avoid that \n being included or do I just need to use a method to remove the last two characters in the last array position for the 2nd player each time? – bobbyjoe Oct 23 '16 at 21:52
Here's what it looks like: – bobbyjoe Oct 23 '16 at 21:52
for the 2nd player: ['b6', 'y4', 'g13', 'b10\n'] – bobbyjoe Oct 23 '16 at 21:53
@bobbyjoe Simplest thing is probably to change the line to: `player1, player2 = mstring.strip().split(':')` strip (see https://docs.python.org/2/library/string.html#string.strip) will remove any whitespace (for example the '\n' new line character) from the beginning/end of the string. – user3468054 Oct 24 '16 at 08:56

score 1 · Answer 2 · answered Oct 22 '16 at 14:40

You can use regular expression to split the letter and number, str.split() to split 2 players' result

import re

for line in file(yourfilename):
    line = line.strip()
    if line != '':#not white space
        results = line.split(':')#results[0] is the first man's result,results[1] is the second man's result
        result1 = results[0].split(' ')
        player1 = []
        for i in range(4):
            grade = re.findall(r'([a-z]+)([0-9]+)', result1[i])
            player1.append([grade[0][0],grade[0][1]])#Split the letter and number
        #player2 is the same as player1

Thanks for the reply. What format will it store in the 'results' array since I notice the 'if' statement above checks if the line is not a white space? How would I add the 2nd player too? There would be a white space in between each token for each player. I need all 4 tokens in the two dimensional array for me to be able to do further calculations on them. — bobbyjoe, Oct 22 '16 at 15:42

score 1 · Answer 3 · answered Oct 22 '16 at 14:56

Use the following approach(crucial functions are re.match and str.split):

import re

# str represents the line form text file
str = 'b14 b15 b12 y4:y11 r7 y1 b2'
player1, player2 = [[list(re.match('^([a-z])(\d+)$', i).groups()) for i in player.split(' ')]
                    for player in str.split(':')
                    ]

print(player1, player2, sep='\n')

The output:

[['b', '14'], ['b', '15'], ['b', '12'], ['y', '4']]
[['y', '11'], ['r', '7'], ['y', '1'], ['b', '2']]

Extracting data from a text file, whitespace, splitting letter and number and colon

3 Answers3