3

I have a file which is continuously growing like this:

https|webmail.mahindracomviva.com|application/vnd.ms-sync.wbxml|158|POST|203.101.110.171
https|webmail.mahindracomviva.com||0|POST|203.101.110.171
https|webmail.mahindracomviva.com||0|POST|203.101.110.171
https|www.googleapis.com|application/x-protobuf|246|POST|74.125.200.95
https|webmail.mahindracomviva.com|application/vnd.ms-sync.wbxml|140|POST|203.101.110.171
https|webmail.mahindracomviva.com|application/x-protobuf|52|POST|203.101.110.171
https|www.googleapis.com|application/x-protobuf|502|POST|74.125.200.95
https|www.googleapis.com|application/x-protobuf|40|POST|74.125.200.95

But I would like to read only the last 50 lines using Pandas.

Abdulrahman Bres
  • 2,603
  • 1
  • 20
  • 39
itsaruns
  • 659
  • 2
  • 11
  • 16
  • Does anything in [this other question/answer](http://stackoverflow.com/questions/17108250/efficiently-read-last-n-rows-of-csv-into-dataframe) help? – summea Jan 06 '14 at 17:38
  • 4
    What OS are you using. In *nix you can first create a file with `tail -n 50 long_file.csv > short_file.csv`, and use it – lev Jan 06 '14 at 17:56
  • Please improve the question. How does one read the "last 50 lines" of a file that's continuously growing? The last line has not arrived yet. – krethika Jul 17 '19 at 06:51

2 Answers2

1

You have to follow this steps:

  1. First find the length of CSV file without loading the whole CSV files into the ram. You have to use chunksize in read_csv().

    import pandas as pd
    count = 0
    for data in pd.read_csv('YourFile.csv',encoding = 'ISO-8859-1',chunksize  = 1000):
        count += 1                          # counting the number of chunks
        lastlen = len(data)                 # finding the length of last chunk
    datalength = (count*1000 + lastlen - 1000) # length of total file
    
  2. Second minus the no of rows which you want to read.

    rowsdiff = datalen - 300
    df = pd.read_csv('YourFile.csv',encoding = 'ISO-8859-1',skiprows = range(1,difrows), nrows = 299) 
    

By this method you have to read only last few lines without laoding the whole CSV file into the ram

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
Adil
  • 71
  • 1
  • 3
-1

Try using the pandas tail(), line so:

filename = "your_file"
last_rows = 3
data = pd.read_csv(filename, header=None, sep = "|")
print(data.tail(last_rows))
0x4ndy
  • 1,216
  • 1
  • 12
  • 25