0

I have a file full of numbers in the form;

010101228522 0 31010 3 3 7 7 43 0 2 4 4 2 2 3 3 20.00 89165.30

01010222852313 3 0 0 7 31027 63 5 2 0 0 3 2 4 12 40.10 94170.20

0101032285242337232323 7 710153 9 22 9 9 9 3 3 4 80.52 88164.20

0101042285252313302330302323197 9 5 15 9 15 15 9 9 110.63 98168.80

01010522852617 7 7 3 7 31330 87 6 3 3 2 3 2 5 15 50.21110170.50

...

...

I am trying to read this file but I am not sure how to go about it, when I use the built in function open and loadtxt from numpy and i even tried converting to pandas but the file is read as one column, that is, its shape is (364 x 1) but I want it to separate the numbers to columns and the blank spaces to be replaced by zeros, any help would be appreciated. NOTE, some places there are two spaces following each other

  • Possible duplicate of [Read Space-separated Data with Pandas](https://stackoverflow.com/questions/22809061/read-space-separated-data-with-pandas) – jberrio Jan 15 '19 at 00:23

2 Answers2

0

If the columns content type is a string have you tried using str.split() This will turn the string into an array, then you have each number split up by each gap. You could then use a for loop for the amount of objects in the mentioned array to create a table out of it, not quite sure this has answered the question, sorry if not.

str.split():

Ollie Pugh
  • 373
  • 2
  • 15
  • Thank you for your quick response, so I tried that now. My code goes a bit like this; arr = [ ]; with open('Kp2001', 'r') as f: for line in f: Split = line.split() arr.append(Split). then convert arr to a dataframe but I now get 1 row and many columns, I think what is happening is that the arrays from split are lined up in a single line as they get appended so that is why I am getting 1 row, I am not sure how to correct this – Moment Mahlangu Jan 15 '19 at 00:23
0

So I finally solved my problem, I actually had to strip the lines and then read each "letter" from the line, in my case I am picking individual numbers from the stripped line and then appending them to an array. Here is the code for my solution;

arr = [] 
with open('Kp2001', 'r') as f:
    for ii, line in enumerate(f):  
         arr.append([])     #Creates an n-d array
         cnt = line.strip() #Strip the lines
         for letter in cnt:  #Get each 'letter' from the line, in my case it's the individual numbers
              arr[ii].append(letter)   #Append them individually so python does not read them as one string

df = pd.DataFrame(arr)    #Then converting to DataFrame gives proper columns and actually keeps the spaces to their respectful columns
df2 = df.replace(' ', 0)      #Replace the spaces with what you will