0

I have 500 text files, I need to find out how many entities are in each files. I did that in the following code:

import os
import pandas as pd

path = "newData"
files = [file for file in os.listdir(path) if file.endswith(".txt")]

c=0
for file in files:
    df = pd.read_csv(os.path.join(path, file), 
                     sep=' ',engine='python')
    
    df.columns = ['word','token','?']
    
    problem = df['token'].tolist().count('B-Problem')
    method = df['token'].tolist().count('B-Method\oTool')
    data = df['token'].tolist().count('B-Dataset')

I need to create an excel sheet to show the information excel output expect:

Filename  #ofProblem  #ofMethod  #ofData
admin.txt    {problem}   {method}  {data}

how can I store them into one big excel sheet for all 500files?

Chang
  • 77
  • 6

1 Answers1

1

Pandas can write excel sheets, just store your values in a dataframe you'll use to write to a file

import os
import pandas as pd

path = "newData"
files = [file for file in os.listdir(path) if file.endswith(".txt")]

out_data = []

c=0
for file in files:
    df = pd.read_csv(os.path.join(path, file), 
                     sep=' ',engine='python')
    
    df.columns = ['word','token','?']
    
    problem = df['token'].tolist().count('B-Problem')
    method = df['token'].tolist().count('B-Method\oTool')
    data = df['token'].tolist().count('B-Dataset')

    out_data.append(
        {
            "Filename": file,
            "#ofProblem": problem,
            "#ofMethod": method,
            "#ofData": data,
        }
    )

pd.DataFrame(out_data).to_excel("your_excel_name.xlsx", index=None)
ted
  • 13,596
  • 9
  • 65
  • 107
  • somehow the excel file only has the column names, but have no values – Chang Sep 21 '21 at 03:43
  • I edited to `out_df.to_excel("your_excel_name.xlsx", index=None)` does it work better? Otherwise I think it's just a matter of exploring parameters of this function here https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html – ted Sep 21 '21 at 03:46
  • 1
    If `files` is empty then the loop won’t iterate leaving you with the default initialisation of `out_df` – Henry Ecker Sep 21 '21 at 03:49
  • still empty. I tired to print the out_df, the index is empty . `Columns: [Filename, #ofProblem, #ofMethod, #ofData] Index: []` – Chang Sep 21 '21 at 03:50
  • when I print `problem` and the other 2, it does have values @HenryEcker – Chang Sep 21 '21 at 03:51
  • As Henry is saying, this might come from an empty list `files` are you sure your filenames are in there? – ted Sep 21 '21 at 03:52
  • 1
    Append is not an inplace operation @ted `out_df = out_df.append(...` but also you really shouldn't be growing a DataFrame like this... [NEVER grow a DataFrame!](https://stackoverflow.com/a/56746204/15497888) – Henry Ecker Sep 21 '21 at 03:52
  • I know, my bad, append is not inplace! Editing :) – ted Sep 21 '21 at 03:52
  • I missed your comments @HenryEcker thanks, I realized in parallel – ted Sep 21 '21 at 03:56
  • @Lee I updated my answer to reflect Henry's comment, you should update your own code too – ted Sep 21 '21 at 03:59