Create a new table in Python

Question

I'm trying to extract data from CNC Machine.

Events happen every millisecond, and I need to filter out some variables that are separated with Pipe "|" delimiter. Log file generated by the PuTTy.exe program.

I tried to read on the pandas, but the columns are not in the same position.

df=pd.read_table('data.log', sep = '|'])

A portion of the log file is shown below.

=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2019.05.24 19:47:51 =~=~=~=~=~=~=~=~=~=~=~=
2019-05-24T22:47:50.894Z|message||PLACA ABERTA-ESQ
2019-05-24T22:47:50.894Z|avail|AVAILABLE|part_count|0|SspeedOvr|50|Fovr|100|tool_id|100|program|51.51|program_comment|UNAVAILABLE|line|0|block|O0051(C1S-LADO2)|path_feedrate|0|path_position|13.9260000000 0.0000000000 5.0000000000|active_axes|X Z C|mode|AUTOMATIC
2019-05-24T22:47:50.894Z|servo|NORMAL||||
2019-05-24T22:47:50.894Z|comms|NORMAL||||
2019-05-24T22:47:50.894Z|logic|NORMAL||||
2019-05-24T22:47:50.894Z|motion|NORMAL||||
2019-05-24T22:47:50.894Z|system|NORMAL||||
2019-05-24T22:47:50.894Z|execution|STOPPED|f_command|0|estop|ARMED|Xact|-182.561|Xload|20
2019-05-24T22:47:50.894Z|Xtravel|NORMAL||||
2019-05-24T22:47:50.894Z|Xoverheat|NORMAL||||
2019-05-24T22:47:50.894Z|Xservo|NORMAL||||
2019-05-24T22:47:50.894Z|Zact|-297.913|Zload|8
2019-05-24T22:47:50.894Z|Ztravel|NORMAL||||
2019-05-24T22:47:50.894Z|Zoverheat|NORMAL||||
2019-05-24T22:47:50.894Z|Zservo|NORMAL||||
2019-05-24T22:47:50.894Z|Cact|0|Cload|0
2019-05-24T22:47:50.894Z|Ctravel|NORMAL||||
2019-05-24T22:47:50.894Z|Coverheat|NORMAL||||
2019-05-24T22:47:50.894Z|Cservo|NORMAL||||
2019-05-24T22:47:50.894Z|S1speed|0|S1load|0
2019-05-24T22:47:50.894Z|S1servo|NORMAL||||
2019-05-24T22:47:50.894Z|S2speed|0|S2load|0
2019-05-24T22:47:50.894Z|S2servo|NORMAL||||
2019-05-24T22:47:51.261Z|S2load|1
2019-05-24T22:47:51.712Z|Zload|9|S2load|0
2019-05-24T22:47:53.056Z|line|650|block|N630G21G40G90G95|path_feedrate|14142|path_position|37.9260000000 0.0000000000 17.0000000000|execution|ACTIVE|Xact|-158.561|Xload|88|Zact|-285.913|Zload|60
2019-05-24T22:47:53.497Z|block|N650G28U0W0|path_position|187.2590000000 0.0000000000 91.6670000000|Xact|-9.228|Xload|49|Zact|-211.246|Zload|20
2019-05-24T22:47:53.932Z|path_feedrate|10000|path_position|196.4870000000 0.0000000000 166.3330000000|Xact|0|Xload|43|Zact|-136.58|Zload|17
2019-05-24T22:47:54.428Z|path_position|196.4870000000 0.0000000000 246.3330000000|Xload|38|Zact|-56.58|Zload|14
2019-05-24T22:47:54.892Z|tool_id|101|path_feedrate|0|path_position|196.4870000000 0.0000000000 302.9130000000|Zact|0|Zload|40
2019-05-24T22:47:55.360Z|line|680|block|N680G92S2500M4|f_command|25|Xload|36|Zload|5|S1speed|402|S1load|110
2019-05-24T22:47:55.852Z|line|690|block|N690G0X68Z5.8M8|path_feedrate|10000|path_position|68.0000000000 0.0000000000 222.9130000000|Xact|-128.487|Xload|64|Zact|-80|Zload|17|S1speed|701|S1load|5
2019-05-24T22:47:56.348Z|path_position|68.0000000000 0.0000000000 142.9130000000|Xload|20|Zact|-160|Zload|16|S1load|2
2019-05-24T22:47:56.812Z|path_position|68.0000000000 0.0000000000 62.9130000000|Xload|21|Zact|-240|Zload|19|S1speed|700
2019-05-24T22:47:57.308Z|path_feedrate|0|path_position|68.0000000000 0.0000000000 5.8000000000|Zact|-297.113|Zload|21|S1speed|701
2019-05-24T22:47:57.772Z|line|700|block|N700G75X-2R1Z0.2P35000Q800F0.25|path_feedrate|180|path_position|65.3420000000 0.0000000000 5.8000000000|Xact|-131.145|Xload|12|Zload|10|S1speed|733|S1load|3
2019-05-24T22:47:58.268Z|path_feedrate|189|path_position|62.3680000000 0.0000000000 5.8000000000|Xact|-134.119|Xload|13|S1speed|768
2019-05-24T22:47:58.704Z|path_feedrate|199|path_position|59.4610000000 0.0000000000 5.8000000000|Xact|-137.026|Xload|15|Zload|9|S1speed|806|S1load|4
2019-05-24T22:47:59.199Z|path_feedrate|209|path_position|56.1810000000 0.0000000000 5.8000000000|Xact|-140.306|Xload|16|Zload|10|S1speed|854|S1load|5
2019-05-24T22:47:59.665Z|path_feedrate|223|path_position|52.6980000000 0.0000000000 5.8000000000|Xact|-143.789|Zload|9|S1speed|915
2019-05-24T22:48:00.188Z|path_feedrate|241|path_position|48.7150000000 0.0000000000 5.8000000000|Xact|-147.772|Xload|12|S1speed|985|S1load|6
2019-05-24T22:48:00.681Z|path_feedrate|263|path_position|44.6650000000 0.0000000000 5.8000000000|Xact|-151.822|Xload|14|Zload|10|S1speed|1077|S1load|7
2019-05-24T22:48:01.148Z|path_feedrate|288|path_position|40.2160000000 0.0000000000 5.8000000000|Xact|-156.271|Xload|16|S1speed|1208|S1load|10
2019-05-24T22:48:01.641Z|path_feedrate|312|path_position|35.3040000000 0.0000000000 5.8000000000|Xact|-161.183|Xload|14|S1speed|1246|S1load|2
2019-05-24T22:48:02.109Z|path_position|30.3130000000 0.0000000000 5.8000000000|Xact|-166.174|Xload|15|Zload|9|S1speed|1248|S1load|3
2019-05-24T22:48:02.573Z|path_position|25.3230000000 0.0000000000 5.8000000000|Xact|-171.164|Xload|11|Zload|10
2019-05-24T22:48:03.040Z|path_position|20.6660000000 0.0000000000 5.8000000000|Xact|-175.821|Zload|9|S1load|2
2019-05-24T22:48:03.481Z|path_position|16.0080000000 0.0000000000 5.8000000000|Xact|-180.479|Xload|15

I need to filter each row by date and time and select variables and values to build a new table in ".csv".

The variables I need are: Date and time, Xload, Zload, S1load, and S1speed.

I do not know how to read this file and create a new table only with the variables I need.

It was better to use skiprow = 1., however I still need to separate the variables: Date and time, Xload, Zload, S1load, and S1speed and create a new table. Maybe I should use data dictionary in Python, but not do this. — Marcelo Santana, Jun 30 '19 at 18:03
your data doesn't have equal number of pipe's in each row so putting it in a pandas DataFrame might not be a good idea. You may need to read the file line by line and and split each row with '|' as delimiter. Then manually find the value of variables that you need. — impopularGuy, Jun 30 '19 at 18:06
You're right. The pandas is great for ironing when there is a pattern between the columns. — Marcelo Santana, Jun 30 '19 at 18:16
I received some very interesting tips from a python expert named Alan Hylands. — Marcelo Santana, Jun 30 '19 at 18:18
The tip of Alan Hylands was: "Create a dictionary with each of your data elements e.g. EventDateTime/path_feedrate/path_position/Xload etc.) and Assign each row a distinct ID then split the line by the pipe delimiter into an array. — Marcelo Santana, Jun 30 '19 at 18:24
After If it is a DateTime string then you can use date and time functions to split that out into separate items in your dictionary (or save the lot and use the functions later when building your new dataframe/table). — Marcelo Santana, Jun 30 '19 at 18:24
As your data lines are not in a standard format or order, you will have to loop through each element of the array and check to see if it is in your keyword list (e.g. path_feedrate, path_position,Xload etc.) If it is then look at the next element of the array to get the value for that row ID and that specific item in your dictionary e.g. — Marcelo Santana, Jun 30 '19 at 18:25
if array[5] = 'Xload' then you know array[6] is the value you want to put into your dictionary for the Xload item. When you have parsed each line into the dictionary, you can then use pandas to load the dictionary into a dataframe for further manipulation and formatting: #convert the dictionary data to a dataframe dfd = pd.DataFrame.from_dict(d, orient='index') — Marcelo Santana, Jun 30 '19 at 18:26

gregory · Answer 1 · 2019-06-30T18:55:09.140

This should get you started. Reads the log, skips the header, and divides up the lines by pipe to create a list of lists that is comma separated:

import csv
with open("'data.log", "r") as file:
     csvreader = csv.reader(file,delimiter='|')
     next(csvreader)
     csvFile = list(csvreader)

You'll need to pluck out the column values you want from each row, if they exist. And, finally, while the log looks to be in proper sequence, you can sort csvFile by using a key in the sorted() function; see here for details: i need to sort a python list of lists by date, for example:

csvFile = sorted(csvFile, key=lambda x: datetime.strptime(x[0], "%Y-%m-%dT%H:%M:%S.%fZ").replace(tzinfo=timezone(timedelta(0))))

impopularGuy · Accepted Answer · 2019-06-30T18:43:36.140

First we read the file row by row and split each row and store it. And assuming the values of "Xload" and other parameters are right behind it.

data=[]
with open('data.log','r') as file:
    for row in file:
        data.append(row.rstrip('\n').split('|'))
columns =['DateTime','Xload','Zload','S1load','S1speed']

data_dic = []
for row in data:
    tmp ={}
    tmp['DateTime']=row[0]
    for i in range(1,len(row)-1):
        if row[i] in columns:
            tmp[row[i]]=row[i+1]
    for c in columns:
        if c not in tmp:
            tmp[c] = '' #for rows which donot have the property
    data_dic.append(tmp)

df = pd.DataFrame(data_dic)

Remove the first line from data.log or you may do that programmatically.

For sorting according to DateTime no need to use any extra library. It is already in ISOformat and comparison work directly.

sorted_dic = sorted(data_dic, key=lambda x:x['DateTime'])

Also, the input data will always be sorted so no need to sort.

Create a new table in Python

2 Answers2

Linked