Read last N lines from CSV file into Pandas DataFrame

Question

I have some csv files, where one line is added every hour to each file.

I want to read the last 20 lines from the file and load them into a dataframe.

My approach is:

log_total = [pd.read_csv(f, skiprows=) for f in glob('./coins/*.csv')]

How do I calculate the total number of rows from the file?

Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. — jezrael, Apr 01 '18 at 07:46
First read the csv file: data=pd.read_csv() then use the slice operator something like last_20_line= data.index[-20:] — Ranjeet, Apr 01 '18 at 09:00

score 0 · Answer 1 · answered Apr 01 '18 at 10:37

AFAIK there is no built in memory-efficient way in Pandas to parse last N lines if you don't know exactly how many rows should be skipped.

You may try the following approach:

from collections import deque
from io import StringIO

def read_last_lines(fn, n=20, encoding='utf-8', **kwargs):
    with open(fn, encoding=encoding) as f:
        return pd.read_csv(StringIO(''.join(deque(f, n))), **kwargs)


log_total = pd.concat([read_last_lines(f,20) for f in glob('./coins/*.csv')], 
                      ignore_index=True)

Read last N lines from CSV file into Pandas DataFrame

1 Answers1