0

I have .csv files (abc.csv, def.csv, etc...) in the directory and I would like to count of rows in every file and save the single file having name column and count column. My expected output as below :

df = name  count
     abc   ....
     def   ....
     ghi   ....

I am doing something like below to get count but not able to covert in dataframe. Please suggest.

import os
path = '/some/path/to/file'
for filename in os.listdir(path):
with open(filename, 'r', encoding="latin-1") as fileObj:
    # -1 to exclude the header
    print("Rows Counted {} in the csv {}:".format(len(fileObj.readlines()) - 1, filename))
Mike
  • 53
  • 4

2 Answers2

2

Put the name/line count of each file into lists, and then create your DataFrame once the loop is over:

import os

import pandas as pd

path = '/some/path/to/file'
names, counts = [], []

for filename in os.listdir(path):
    with open(filename, 'r', encoding="latin-1") as fileObj:
        names.append(filename)
        # -1 to exclude the header
        counts.append(len(fileObj.readlines()) - 1)

df = pd.DataFrame({'name': names, 'count': counts})
jfaccioni
  • 7,099
  • 1
  • 9
  • 25
  • Note: If the files are huge consider using [this solution](https://stackoverflow.com/a/16108605/6699447) to count the no of rows. Because `readlines` method returns a list of lines. – Abdul Niyas P M Jul 02 '20 at 14:42
0

using pandas and pathlib

from pathlib import Path
import pandas as pd

path_to_csv = 'your\\dir'

file_info = pd.DataFrame( {file.name : 
      pd.read_csv(file).shape for file in Path(path_to_csv).glob('*.csv')})\
      .T.rename(columns={0 : 'rows', 1 : 'columns'})

print(file_info)

              rows  columns
01_02_20.csv     3        3
01_28_20.csv     3        4
12_02_19.csv    77       10
12_09_19.csv    86        7
Umar.H
  • 22,559
  • 7
  • 39
  • 74