Why pandas read_csv only read top 10000 rows of a csv?

Asked Apr 27 '22 at 06:42

Active Apr 27 '22 at 06:42

Viewed 471 times

I have a csv file called train.csv with 15000 rows, I try to read it by using pd.read_csv('train.csv') however, every time I run it, it only read in the top 10000 rows of data, when I print the shape of the data frame, it shows me (10000, 29). Can anyone help me? Thank you

PS: I am using Google Colab

asked Apr 27 '22 at 06:42

Michael

1

This is definitely not a fault in the `read_csv` function as I have gotten dataframes with 42k+ rows with it. First load the dataframe and print it out, check the last row of it, go to this row in your csv file and check what's different in the format. The seperator might be different for it, or there might be some error in the file itself. IF you don't mind please share the file itself. – Zero Apr 27 '22 at 06:45
hey I think you should check this https://stackoverflow.com/questions/25962114/how-do-i-read-a-large-csv-file-with-pandas – DataSciRookie Apr 27 '22 at 06:47
1

This is definitely not a problem of .read_csv method or Google Colab. In GC pd.read_csv() opens 1kk rows csv files for me like a breeze. Check, what's wrong with the data. May be, smth makes read_csv "think", that your file ends after 10k-th row. – Konstantin Z Apr 27 '22 at 07:37
hi guys, I don't mind to share the file, but how can I share it? – Michael Apr 27 '22 at 07:38

Why pandas read_csv only read top 10000 rows of a csv?

0 Answers0