2

I am new with python and I want to read my data from a .txt file. There are except of the header only floats. I have 6 columns and very much rows. To read it, I'm using genfromtxt. If I want to read the first two columns it's working, but if i want to read the 5th column I'm getting the following error:

Line #1357451 (got 4 columns instead of 4)

here's my code:

import numpy as np
data=np.genfromtxt(dateiname, skip_header=1, usecols=(0,1,2,5))
print(data[0:2, 0:3])

I think there are missing some values in the 5th column, so it doesn't work. Has anyone an idea to fix my problem and read the datas of the 5th column?

Dark
  • 179
  • 2
  • 12
  • what is on the line giving the error? are there more than 4 columns there? – Alex Garcia Jun 21 '18 at 13:31
  • There are 6 columns in total – Dark Jun 21 '18 at 13:39
  • on line 1357451 particularly? – Alex Garcia Jun 21 '18 at 13:39
  • I don't know how to check it. everything I tried, didn't work. Any idea? – Dark Jun 21 '18 at 13:43
  • well read that line of the file and tell us what is there :) – Alex Garcia Jun 21 '18 at 13:45
  • With a white space delimiter `genfromtxt` can't identify that missing data. Pandas might be better at that . – hpaulj Jun 21 '18 at 13:49
  • I found out that one column is completely empty, but I can read this column when I say it is one column less. I think this completely empty column is not recognized and will be omitted. But when I want to check all the data, there's still the same error. I think some entrances are still empty. – Dark Jun 21 '18 at 13:49
  • Okay now I found the columns with notepad++. There is really missing one more value. Can panda solve my problem? – Dark Jun 21 '18 at 14:12
  • possible duplicate of https://stackoverflow.com/questions/3761103/using-genfromtxt-to-import-csv-data-with-missing-values-in-numpy – anishtain4 Jun 21 '18 at 14:28
  • And yes, pandas will solve your problem – anishtain4 Jun 21 '18 at 14:29
  • Thank you @anishtain4 The link was really helpful. I found another solution. With `filling_values=0` I could fill the emty values with zero. Now it is working! :) – Dark Jun 22 '18 at 06:43

2 Answers2

1

From the genfromtxt docs:

Notes
-----
* When spaces are used as delimiters, or when no delimiter has been given
  as input, there should not be any missing data between two fields.

If all columns, including missing ones, line up properly you could use a fixed column width version of the delimiter.

 An integer or sequence of integers
    can also be provided as width(s) of each field.

When a line looks like:

 one, 2, 3.4, , 5, ,

it can unambiguously identify 7 columns. If instead it is is

 one 2 3.4  5    

it can only identify 4 columns (in general two blanks count as one, etc, and trailing blanks are ignored)

hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

I found another solution. With filling_values=0 I could fill the empty values with zero. Now it is working! :)

import numpy as np
data=np.genfromtxt(dateiname, skip_header=1, usecols=(0,1,2,5), delimiter='\t', invalid_raise=False, filling_values=0)

Furthermore I didn't leave the delimiter on default anymore but defined the tab distance and with invalid_raise you could skip the values that are missing.

Dark
  • 179
  • 2
  • 12