I have a data foundation of 36 .log-Files, which I need to pre process in order to load them into a pandas data frame for data visualization within python frameworks.
To provide an example of a single line within one of the .log-Files:
[16:24:42]: Downloaded 0 Z_SYSTEM_FM traces from DEH, clients (282) from 00:00:00,000 to 00:00:00,000
From several sources and posts on here I figure out the following code to be the best performing one:
foreach ($f in $files){
$date = $f.BaseName.Substring(22,8)
((Get-Content $f) -match "^.*\bDownloaded\b.*$") -replace "[[]", "" -replace "]:\s", " "
-replace "Downloaded " -replace "Traces from " -replace ",.*" -replace "$", " $date"
| add-content CleanedLogs.txt
}
Variable $date
is containing the date, the respective .log-file is logging.
I am not able to change the input text data. I tried to read in the 1,55GB using -raw, but I couldn't manage to split up the resulting single string after processing all operations. Additionally I tried to use more regex expression, but there was no reduction of the total runtime. Maybe there is a way to use grep for this operations?
Maybe someone has a genious tweak to speed up this operation. At the moment this operation takes up close to 20 minutes to compute. Thank you very much!