0

Before someone points out duplicate, this is not the same question as this.

In that question, his error was

ValueError: Some errors were detected !
Line #88 (got 1435 columns instead of 1434)

having 1 more column than expected (likely an extra delimiter somewhere).

I am processing a file with two columns separated by a tab ('\t') and am using the following

movies = np.genfromtxt('imdb/movie_keywords', delimiter = '\t', dtype = None)

I receive the following error

ValueError: Some errors were detected !
Line #44209 (got 1 columns instead of 2)
Line #44210 (got 1 columns instead of 2)
Line #44211 (got 1 columns instead of 2)
Line #93460 (got 1 columns instead of 2)
...

Here are four lines (raw text) from the file,

The first two are line #1 and line #, which do not throw an errors

'$ (1971)\tbank-heist'
'Angela (1954)\tamerican-car-salesman'

These are from lines #44209 # 93463, which throw an error

'Animated (1989)\taustralian'
'Animated Motion #1 (1976)\tindependent-film'

Might some sleuth point out the difference here which causes numpy not to pick up the tab in the error throwing lines?

To add, I receive no error if using pandas and this code:

keywords = pd.read_csv('imdb/movie_keywords', delimiter = '\t', dtype = None, names = ['movie', 'keyword'])

Pandas however is not sufficient for the operations I wish to conduct.

Community
  • 1
  • 1
PandaBearSoup
  • 699
  • 3
  • 9
  • 20
  • You might encounter this error if `Animated (1989)\taustralian` contains a literal backslash followed by a literal `t` instead of a tab character. – unutbu Aug 03 '15 at 21:37
  • @unutbu the text from the file: "Animated (1989) australian" – PandaBearSoup Aug 03 '15 at 21:39
  • `genfromtxt` reports line numbers with the count starting at 1. Python uses 0-based indexing. Depending on how you located the the 44209th line, there might be an "off-by-one" error. It might not hurt to check the line preceding `'Animated (1989)\taustralian'` too. – unutbu Aug 03 '15 at 22:22
  • @unutbu Good thinking, I had considered this. This is why I chose line #93463 As lines #93460-#93465 all return errors. – PandaBearSoup Aug 03 '15 at 22:31
  • Could you post the `repr` of these lines? – unutbu Aug 03 '15 at 22:32
  • @unutbu repr is what was used to produce the raw strings in the original question. – PandaBearSoup Aug 03 '15 at 22:52

1 Answers1

0

The aim of this question was to find the issue with Numpy, as stated in the question using Pandas results in no error. If someone is however looking for a workaround, this seems to work:

keywords = pd.read_csv('imdb/movie_keywords', delimiter = '\t', dtype = None, names = ['movie', 'keyword'])

keywords_array = keywords.as_matrix()
PandaBearSoup
  • 699
  • 3
  • 9
  • 20