0

I am struggling with importing different files into dataframe in pandas and hope somebody did something similar before and can help me out.

In this example I am loading files from two different directories:

>>> files_list = glob.glob('**/*.all', recursive=True)

>>> files_list
['dir2/file1.all', 'dir2/file2.all', 'dir1/file1.all', 'dir1/file2.all']

content of each of the files needs to be separated into two columns: timestamp, rest, and this is acheived with function:

def process(log_line):
    '''Define a regex-based processing function that splits on first space.'''
    match = list(filter(None, re.split(pattern='([^\s]+)', string=log_line, maxsplit=1)))
    timestamp, rest = match
    return {'timestamp': timestamp, 'rest': rest}

now when I work throught the files:

for file_name in files_list: 
  with open(file_name, "r") as f:
    lines = f.readlines()
    parsed_data = [process(line) for line in lines]
    df = pd.DataFrame(parsed_data)
    dataframe_list.append(df)

dataframe_list - seems to have all the files content but its a list and I am unable to import it to pandas with this error:

>>> type(dataframe_list)
<class 'list'>
>>> andre = pd.DataFrame(dataframe_list)
/usr/local/lib/python3.8/dist-packages/pandas/core/internals/construction.py:309: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  values = np.array([convert(v) for v in values])

Q1: How can I create dataframe from that dataframe_list ?

or

Q2: how do I create dataframe for each file with the name of the file? (dir1/file1, dir1/file2....) while reading through them ?

AloneTogether
  • 25,814
  • 5
  • 20
  • 39
  • Does this answer your question? [Import multiple csv files into pandas and concatenate into one DataFrame](https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe) – Mr. Hobo Oct 25 '21 at 14:06
  • Would You like to have names of files? What is the type of files (.txt, .csv. etc)? – Jerzy Oct 25 '21 at 14:06
  • - those are concentrated log files, naming is *.all thats why I have to use process function to split timeframe from logtext . . - ideally each DF would have name of the file, so example: dir1.file1, dir2.file2 – I'm not a robot Oct 25 '21 at 14:45

0 Answers0