0

I have managed to read the following log file into python:

import os
import glob
import pandas as pd


folder = r'C:\Users\x\x\x\x\\'

for infile in glob.glob(os.path.join(folder, 'console*')):
    file = open(infile, 'r').read()
    print( file)

print(file) gives me:

John, 1,7,8, text
Matt, 3,7,10, text2
Natasha, 4,60,3,text3

I am hoping to convert into a pandas df:

df = pd.DataFrame(file)

but getting a ValueError: DataFrame constructor not properly called!

Does anyone know how to construct the Dataframe of 3 rows by 5 columns and then add in my own columns headers? Thanks very much!

SOK
  • 1,732
  • 2
  • 15
  • 33
  • Are you looping through more than one file? At the end of the loop what is the value of the variable `file`? It seems that you need just to convert the output in an array or dictionary and them Pandas will create a Dataframe the way you want. – renatomt Jul 22 '20 at 12:01
  • This will help full - https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe – Dishin H Goyani Jul 22 '20 at 12:07
  • Also, are you expecting different format files or will it always be .txt or .csv files. Are you looking at appending all the logs of similar format together in single file ? – vrana95 Jul 22 '20 at 12:11
  • hi all. Its just one `console` file that keeps updating so will run it periodically. the text comes into python fine but just trying to convert into the dataframe – SOK Jul 22 '20 at 12:14
  • So , if it would always be a .txt file just try below in your code : data = pd.read_csv('sample.txt', sep=" ", header=None) data.columns = ["a", "b", "c", "etc."] – vrana95 Jul 22 '20 at 12:17
  • thanks @vrana95. So my issue is that the `console` file does not have `.txt` as an extension so the `pd.read_csv` didnt work - which is why i have used the loop to import text – SOK Jul 22 '20 at 12:19
  • 1
    @SOK that's not an issue at all. `read_csv` doesn't care about extensions. What did you actually try? Which files did you try to load? Did `glob` miss some files perhaps? – Panagiotis Kanavos Jul 22 '20 at 12:34
  • Ah yes thank you! It wasnt working but once I added `.log` to the `console` filename I was searching for it worked! – SOK Jul 23 '20 at 02:09

1 Answers1

0
import os
import glob
import pandas as pd

folder = 'C:\\'
filename2 = [y for y in glob.glob(f'{folder}\\*.*')]

# In the case of .csv files.

df_cc = pd.DataFrame()
for z in filename2:  
        df = pd.read_csv(z, header = None)
        df_cc = df_cc.append(df)

Paul Roub
  • 36,322
  • 27
  • 84
  • 93
user3685918
  • 335
  • 2
  • 12
  • 3
    Your answer is currently flagged as low quality and could be deleted in the review process. Please make sure your answer contains an explanation aside from any code. – Tim Stack Jul 22 '20 at 14:56