I am struggling with importing different files into dataframe in pandas and hope somebody did something similar before and can help me out.
In this example I am loading files from two different directories:
>>> files_list = glob.glob('**/*.all', recursive=True)
>>> files_list
['dir2/file1.all', 'dir2/file2.all', 'dir1/file1.all', 'dir1/file2.all']
content of each of the files needs to be separated into two columns: timestamp, rest, and this is acheived with function:
def process(log_line):
'''Define a regex-based processing function that splits on first space.'''
match = list(filter(None, re.split(pattern='([^\s]+)', string=log_line, maxsplit=1)))
timestamp, rest = match
return {'timestamp': timestamp, 'rest': rest}
now when I work throught the files:
for file_name in files_list:
with open(file_name, "r") as f:
lines = f.readlines()
parsed_data = [process(line) for line in lines]
df = pd.DataFrame(parsed_data)
dataframe_list.append(df)
dataframe_list - seems to have all the files content but its a list and I am unable to import it to pandas with this error:
>>> type(dataframe_list)
<class 'list'>
>>> andre = pd.DataFrame(dataframe_list)
/usr/local/lib/python3.8/dist-packages/pandas/core/internals/construction.py:309: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
values = np.array([convert(v) for v in values])
Q1: How can I create dataframe from that dataframe_list ?
or
Q2: how do I create dataframe for each file with the name of the file? (dir1/file1, dir1/file2....) while reading through them ?