Building a parser to use in script (bash)

Question

I need to discovery patterns in a string by bash, I would like put auto-execution with crontab.

I have a string that contain data like %d/%m/%Y %H:%i aaa bbb ccc 123456 ddd 7890 eee and something like that. It's a report.

I thought to define constants like string masks, and compare every substring with my masks. I think I will use a mix with lenght and char positition.

I'm googling to have better idea and watch some other implementation, but I'm not finding useful results.

Any suggestion? Thanks.

Edit: some sample of input

01/01/2015 06:20 EXAMPLE 2 (001) Foo bar X(12) 
02/01/2015 12:03 EXAMPLE 1 (000) 01234567 Baz bax X(04) 
03/01/2015 10:37 EXAMPLE 1 (000) Bam bac (X:1-16). [ SOMEGUY ] 
04/01/2015 11:04 EXAMPLE 2 (001) 12345678 Baz bax X(12) SOMEACTION 
05/01/2015 12:34 EXAMPLE 2 (001) 45678901 Bim bum X(01) SOMEACTION NAME SURNAME
08/08/2015 19:00 SOMEGUY Bic bac. [ SOMEGUY ] 
01/01/2015 11:34 EXAMPLE 2 (001) 78901234 Gic gia gim X(01)

whereas as output I need

variabile $date $time $example $codeline $action $message $name $surname

Edit2: I forgot to say I'm looping that lines with this

while IFS=' ' read -ra field; do
...
done <<< "$line"

As described above, this sounds like a task for `grep`. Otherwise you need to improve your question with 3 lines of sample input (including one line that should NOT be processed), AND your required output from the sample input. You should read enough about grep (many tutorials available) that you can improve your question with an attempt with a reg-exp to match the lines yous want to capture. Otherwise you're likely to get downvoted and close votes. Good luck. — shellter, Oct 07 '15 at 14:42
One suggestion: build a *very concise* example, and show us what would be the output. — Rubens, Oct 07 '15 at 14:44
ok i will try few examples `01/01/2015 06:20 EXAMPLE 2 (001) Foo bar X(12)` `02/01/2015 12:03 EXAMPLE 1 (000) 01234567 Baz bax X(04)` `03/01/2015 10:37 EXAMPLE 1 (000) Bam bac (X:1-16). [ SOMEGUY ]` `04/01/2015 11:04 EXAMPLE 2 (001) 12345678 Baz bax X(12) SOMEACTION` `05/01/2015 12:34 EXAMPLE 2 (001) 45678901 Bim bum X(01) SOMEACTION NAME SURNAME` `08/08/2015 19:00 SOMEGUY Bic bac. [ SOMEGUY ]` `01/01/2015 11:34 EXAMPLE 2 (001) 78901234 Gic gia gim X(01)` well, this are examples coming from my real world — rivaldid, Oct 07 '15 at 23:03
please edit your question to include your sample input and expected outputs. Use the `{}` tool at the top left of the edit box after highlighting your text with line breaks. Good luck. — shellter, Oct 09 '15 at 00:05
again, edit your question to include the expected output. `$date $time` is easy, what about `$name $surname $action`. It seems like your data is a jumble of incomplete information. You'll need to show that you've tried to solve at least some of this on your own. Have you worked thru an `awk` tutorial or two? It could be very helpful. see http://www.grymoire.com/Unix/Awk.html ? Good luck. — shellter, Oct 09 '15 at 21:25

score 1 · Answer 1 · edited May 23 '17 at 12:07

Use date to format your string:

$ date +"%d/%m/%Y %H:%m aaa bbb ccc 123456 ddd 7890 eee"
09/10/2015 14:10 aaa bbb ccc 123456 ddd 7890 eee

if that's what you meant.

Alternatively use printf, for example:

printf "%s/%s/%s %s:%s aa bb cc" 2015 01 01 00 00

or create equivalent sprintf function:

sprintf() { local stdin; read -d '' -u 0 stdin; printf "$@" "$stdin"; }

If you want to read other way round, use read, e.g.:

while IFS=':/ ' read d m y h m _; do echo "$d $m $y $h $m"; done < data.txt

For more examples, see: How do I split a string on a delimiter in Bash?

score 1 · Answer 2 · answered Oct 09 '15 at 21:34

1

Could be an aplroach more complex than what you need. But you are going in the same way... so:

Have you ever heard about machine learning tecniques used to recognize images? They are actually using many different masks (in your case a string mask) that you will need to chose randomly and then correct stocasticaly upon analises. XOR the mask with the string and sum characters value to a int. You will get a number for each mask, and you will actually produce a hash that tells you the matching of the string to your masks. Comparing similar hashs (with int numbers close to each other) those will be similar strings.

This is a tip. You can go easier or deeper, depend on your requirements.

answered Oct 09 '15 at 21:34

Newbie

4,462
11
23

Yes, this solution could be the more strong and heavy. Actually I'm writing a work-around with perl and regex, so I define my constants date time foo bar baz to build mask-lines, variables of variables to take in short, in a given when construct. Seems quite easy with perl, bash looks hard for this type of operations. – rivaldid Oct 10 '15 at 23:05
1

You didn't defined exactly the scenario, so i gave you the most stabile and versatile solution! – Newbie Oct 11 '15 at 10:09

score 0 · Answer 3 · answered Oct 19 '15 at 14:30

At the end I solved with perl and regex, I have defined my string masks $FOO $BAR $BAZ, and then I have compared my input string with them

if ($myinputstring =~ $FOO) { 
 statement 
} elseif($myinputstring =~ $BAR) {
 otherstatment
} elseif ($myinputstring =~ $BAZ) {
 someotherstatement
} else {
 print_to_unmatched_log
}

Thanks

rivaldid · Answer 4 · 2015-12-02T17:24:49.583

at the end I have simplified my issue and I got back the bash solution. This is a fast pseudo, tell me what do you think about.

pre:
myregex1="^[0-9]{2}/[0-9]{2}/[0-9]{4}[[:space:]][0-9]{2}:[0-9]{2}$"
myregex2="^[[:space:]]\([0-9]{3}\)$"
myregex3="^[[:space:]][0-9]{8}$"
myregex4="^foo[[:space:]]bar$"
myregex5="^[[:space:]]baz\([0-9]{3}\)$"
...
nospace() { printf "$1" | sed -e 's/^[[:space:]]*//'; }



   the code:
    while loop each line of my source text file; do
    buffer="";i=0
    while IFS= read -r -N 1 char; do
    buffer+="$char"; let "i++"
    if [[ $buffer =~ $myregex1 ]]; then printf -v myvar1 "$(nospace "$buffer")"; i=$(( $i - ${#buffer} )); buffer="${buffer::-$i}" 
    elif [[ $buffer =~ $myregex2 ]]; then printf -v myvar2 SAME_STATEMENT_BEFORE
    elif SAME_STATEMENT_BEOFRE_WITH_MYVAR3
    elif ...
    fi
    done <<< "$mylinegotfromtextfile"
    done < $mytextfile

That's all, did you know a better solution?

just to explain, for every char of the line, append this char in a buffer and increase an index, if buffer match with one of my regex then clean from space and put in a dedicated variable, decrease the index with the length of buffer and strip from buffer the last character. in this way I can have the unmatched pattern. — rivaldid, Dec 02 '15 at 17:34

Building a parser to use in script (bash)

4 Answers4