-2

I have some csv files, where one line is added every hour to each file.

I want to read the last 20 lines from the file and load them into a dataframe.

My approach is:

log_total = [pd.read_csv(f, skiprows=) for f in glob('./coins/*.csv')]

How do I calculate the total number of rows from the file?

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
coding404
  • 15
  • 2
  • 5
  • Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Apr 01 '18 at 07:46
  • First read the csv file: data=pd.read_csv() then use the slice operator something like last_20_line= data.index[-20:] – Ranjeet Apr 01 '18 at 09:00

1 Answers1

0

AFAIK there is no built in memory-efficient way in Pandas to parse last N lines if you don't know exactly how many rows should be skipped.

You may try the following approach:

from collections import deque
from io import StringIO

def read_last_lines(fn, n=20, encoding='utf-8', **kwargs):
    with open(fn, encoding=encoding) as f:
        return pd.read_csv(StringIO(''.join(deque(f, n))), **kwargs)


log_total = pd.concat([read_last_lines(f,20) for f in glob('./coins/*.csv')], 
                      ignore_index=True)
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419