How to split a log file into several csv files with python

Question

I'm pretty new to python and coding in general, so sorry in advance for any dumb questions. My program needs to split an existing log file into several *.csv files (run1,.csv, run2.csv, ...) based on the keyword 'MYLOG'. If the keyword appears it should start copying the two desired columns into the new file till the keyword appears again. When finished there need to be as many csv files as there are keywords.

53.2436     EXP     MYLOG: START RUN specs/run03_block_order.csv
53.2589     EXP     TextStim: autoDraw = None
53.2589     EXP     TextStim: autoDraw = None
55.2257     DATA    Keypress: t
57.2412     DATA    Keypress: t
59.2406     DATA    Keypress: t
61.2400     DATA    Keypress: t
63.2393     DATA    Keypress: t
...
89.2314     EXP     MYLOG: START BLOCK scene [specs/run03_block01.csv]
89.2336     EXP     Imported specs/run03_block01.csv as conditions
89.2339     EXP     Created sequence: sequential, trialTypes=9
...

[EDIT]: The output per file (run*.csv) should look like this:

onset       type
53.2436     EXP     
53.2589     EXP     
53.2589     EXP     
55.2257     DATA    
57.2412     DATA    
59.2406     DATA    
61.2400     DATA    
...

The program creates as much run*.csv as needed, but i can't store the desired columns in my new files. When finished, all I get are empty csv files. If I shift the counter variable to == 1 it creates just one big file with the desired columns.

Thanks again!

import csv

QUERY = 'MYLOG'

with open('localizer.log', 'rt') as log_input:
i = 0

for line in log_input:

    if QUERY in line:
        i = i + 1

        with open('run' + str(i) + '.csv', 'w') as output:
            reader = csv.reader(log_input, delimiter = ' ')
            writer = csv.writer(output)
            content_column_A = [0]
            content_column_B = [1]

            for row in reader:
                content_A = list(row[j] for j in content_column_A)
                content_B = list(row[k] for k in content_column_B)
                writer.writerow(content_A)
                writer.writerow(content_B)

Please describe what each one of the new files should look like. — barak manos, Nov 21 '16 at 12:56
It would be useful to provide: 1. The expected output, and 2. The actual output or what goes wrong. Also, `counter` variable doesn't seem relevant to this piece of code, maybe remove so that it's easier to get to the point. — Geekfish, Nov 21 '16 at 13:02

Geekfish · Accepted Answer · 2016-11-22T12:18:12.630

Looking at the code there's a few things that are possibly wrong:

the csv reader should take a file handler, not a single line.
the reader delimiter should not be a single space character as it looks like the actual delimiter in your logs is a variable number of multiple space characters.
the looping logic seems to be a bit off, confusing files/lines/rows a bit.

You may be looking at something like the code below (pending clarification in the question):

import csv
NEW_LOG_DELIMITER = 'MYLOG'

def write_buffer(_index, buffer):
    """
    This function takes an index and a buffer.
    The buffer is just an iterable of iterables (ex a list of lists)
    Each buffer item is a row of values.
    """
    filename = 'run{}.csv'.format(_index)
    with open(filename, 'w') as output:
        writer = csv.writer(output)
        writer.writerow(['onset', 'type'])  # adding the heading
        writer.writerows(buffer)

current_buffer = []
_index = 1

with open('localizer.log', 'rt') as log_input:
    for line in log_input:
        # will deal ok with multi-space as long as
        # you don't care about the last column
        fields = line.split()[:2]
        if not NEW_LOG_DELIMITER in line or not current_buffer:
            # If it's the first line (the current_buffer is empty)
            # or the line does NOT contain "MYLOG" then
            # collect it until it's time to write it to file.
            current_buffer.append(fields)
        else:
            write_buffer(_index, current_buffer)
            _index += 1
            current_buffer = [fields]  # EDIT: fixed bug, new buffer should not be empty
    if current_buffer:
        # We are now out of the loop,
        # if there's an unwritten buffer then write it to file.
        write_buffer(_index, current_buffer)

Thanks for your great work! Especially your comments proved to be verry helpful. There's just one last thing: When opening one of the run files in Excel (or Open Office) everything is written in one column and the input is separeted by a comma (for example A1 = onset,type, A2 = blank, A3 = 65.2421,EXP, A4 = blank ...). — STD, Nov 21 '16 at 22:05
I was able to eliminate all the blank cells by adding the newline='' argument within the with open function. — STD, Nov 22 '16 at 09:03
There was a bug in my answer btw, I have added a comment on the fix. — Geekfish, Nov 22 '16 at 12:18
Thank you @Geekfish, you've been very helpfull. One last question: Do you have any idea how to separate the columns? In Excel both columns are merged into one (for example A1 = onset,type, A2 = 65.2421,EXP, but it should look like A1 = onset, B1 = type, A1 = 65,2421, B2 = EXP) — STD, Nov 22 '16 at 12:27
@STD it sounds like the Excel default delimiter for csv might be expecting a character other than a comma. Might be worth taking a look here: http://superuser.com/questions/606272/how-to-get-excel-to-interpret-the-comma-as-a-default-delimiter-in-csv-files It shouldn't be a problem with generating the file, just how you open it in Excel. — Geekfish, Nov 22 '16 at 12:36

Waylon Walker · Answer 2 · 2016-11-23T02:26:26.863

0

You can use pandas to simplify this problem.

Import pandas and read in log file.

import pandas as pd

df = pd.read_fwf('localizer2.log', header=None)
df.columns = ['onset', 'type', 'event']
df.set_index('onset', inplace=True)

Set Flag where third column == 'MYLOG'

df['flag'] = 0
df.loc[df.event.str[:5] == 'MYLOG', 'flag'] = 1
df.flag = df['flag'].cumsum()

Save each run as a separate run*.csv file

for i in range(1, df.flag.max()+1):
    df.loc[df.flag == i, 'event'].to_csv('run{0}.csv'.format(i))

EDIT: Looks like your format is different than I originally assumed. Changed to use pd.read_fwf. my localizer.log file was a copy and paste of your original data, hope this works for you. I assumed by the original post that it did not have headers. If it does have headers then remove header=None and df.columns = ['onset', 'type', 'event'].

edited Nov 23 '16 at 02:26

answered Nov 21 '16 at 14:24

Waylon Walker

543
3
10

Thank you for your work waylon. When your code i executed the following errors occur: File "C: [...]", line 3, in df = pd.read_csv('localizer.log').set_index('onset') and File "pandas\parser.pyx", line 805, in pandas.parser.TextReader.read (pandas\parser.c:8748) File "pandas\parser.pyx", line 827, in pandas.parser.TextReader._read_low_memory (pandas\parser.c:9003) – STD Nov 21 '16 at 22:22
Looks like an issue of file format. Try using `pd.read_fwf` as shown in the edit. – Waylon Walker Nov 23 '16 at 02:29

How to split a log file into several csv files with python

2 Answers2

Linked