Numpy ValueError (got 1 columns instead of 2)

Question

Before someone points out duplicate, this is not the same question as this.

In that question, his error was

ValueError: Some errors were detected !
Line #88 (got 1435 columns instead of 1434)

having 1 more column than expected (likely an extra delimiter somewhere).

I am processing a file with two columns separated by a tab ('\t') and am using the following

movies = np.genfromtxt('imdb/movie_keywords', delimiter = '\t', dtype = None)

I receive the following error

ValueError: Some errors were detected !
Line #44209 (got 1 columns instead of 2)
Line #44210 (got 1 columns instead of 2)
Line #44211 (got 1 columns instead of 2)
Line #93460 (got 1 columns instead of 2)
...

Here are four lines (raw text) from the file,

The first two are line #1 and line #, which do not throw an errors

'$ (1971)\tbank-heist'
'Angela (1954)\tamerican-car-salesman'

These are from lines #44209 # 93463, which throw an error

'Animated (1989)\taustralian'
'Animated Motion #1 (1976)\tindependent-film'

Might some sleuth point out the difference here which causes numpy not to pick up the tab in the error throwing lines?

To add, I receive no error if using pandas and this code:

keywords = pd.read_csv('imdb/movie_keywords', delimiter = '\t', dtype = None, names = ['movie', 'keyword'])

Pandas however is not sufficient for the operations I wish to conduct.

You might encounter this error if `Animated (1989)\taustralian` contains a literal backslash followed by a literal `t` instead of a tab character. — unutbu, Aug 03 '15 at 21:37
@unutbu the text from the file: "Animated (1989) australian" — PandaBearSoup, Aug 03 '15 at 21:39
`genfromtxt` reports line numbers with the count starting at 1. Python uses 0-based indexing. Depending on how you located the the 44209th line, there might be an "off-by-one" error. It might not hurt to check the line preceding `'Animated (1989)\taustralian'` too. — unutbu, Aug 03 '15 at 22:22
@unutbu Good thinking, I had considered this. This is why I chose line #93463 As lines #93460-#93465 all return errors. — PandaBearSoup, Aug 03 '15 at 22:31
@unutbu repr is what was used to produce the raw strings in the original question. — PandaBearSoup, Aug 03 '15 at 22:52

score 0 · Answer 1 · answered Aug 03 '15 at 21:50

0

The aim of this question was to find the issue with Numpy, as stated in the question using Pandas results in no error. If someone is however looking for a workaround, this seems to work:

keywords = pd.read_csv('imdb/movie_keywords', delimiter = '\t', dtype = None, names = ['movie', 'keyword'])

keywords_array = keywords.as_matrix()

answered Aug 03 '15 at 21:50

PandaBearSoup

699
3
9
20

Did you try `genfromtxt` with the `names` parameter? – hpaulj Aug 03 '15 at 23:19

Numpy ValueError (got 1 columns instead of 2)

1 Answers1