2

Could someone tell me how I'd write the following code in a linux bash script?

procedure ParseLine(Line: String; var url, lang, Identifier: String);
var
  p1,p2: Integer;
Begin
  p1 := Pos(Char(VK_TAB),Line);
  p2 := PosEx(Char(VK_TAB),Line,p1+1);
  url := Copy(Line,1,p1-1);
  lang := Copy(Line,p1+1,p2 - (p1+1));
  Identifier := Copy(Line,p2+1,Length(Line));
  p1 := Pos('(',lang);
  lang := Copy(lang,1,p1-1);
End;

The line I need to parse looks something like this

XXXXX \tab XXXX(XXX) \tab XXXX

Thanks.

btd
  • 31
  • 1
  • This is a bash question and not a Delphi question. And it looks like a pretty trivial regex unpacker. Are you wedded to bash? I'd do it with perl myself. – David Heffernan Jul 18 '11 at 18:46
  • possible duplicate of [Split string based on delimiter in bash?](http://stackoverflow.com/questions/918886/split-string-based-on-delimiter-in-bash) – Cosmin Prund Jul 18 '11 at 19:08
  • Thanks, i've been fiddling with regex aswell but I can't seem to get the pattern for it correct. – btd Jul 18 '11 at 19:08
  • Something like this, depending on your regex flavour `(.*?)\t(.*?)\((.*?)\)\t(.*?)` – David Heffernan Jul 18 '11 at 19:11
  • **Do not close:** the Dupe that **I** found deals with splitting on delimiter and doesn't use a regex; The OP specifically requested splitting using regex, and the split is not a simple delimiter-based split because of the parenthesis that surround the 3rd part of the input. – Cosmin Prund Jul 19 '11 at 05:14

1 Answers1

3

Here's a BASH script that works for your sample input. Unfortunately I didn't find a way to specify the "Tab" character alone, I used the [:blank:] class (it also includes space). If you really need to only match tab and not space as delimiter, you can replace all the [:blank:] occurrence with actual TAB characters you'd type from your keyboard. I also didn't save the matched parts to some global variables (as bash functions would normally do) I simply echo'ed them.

#!/bin/bash

function split {
  # Preapre small parts of the future regex. Makes writing the actual regex
  # easier and provides a place to explain the regex
  blank="[[:blank:]]" # one blank character (tab or space). Uses the [:blank:] character class in a character set regex selector
  optional_blanks="${blank}*" # zero or more blank characters.
  mandatory_blanks="${blank}+" # one or more blank characters.
  non_blank="[^()[:blank:]]" # one character that is not tab space or paranthesis: This is the stuff we intend to capture.
  capture="(${non_blank}+)" # one or more non-blank non paranthesis characters in captaruing paranthesis.

  # Concatenate our regex building blocks into a big regex. Notice how I'm using ${optional_blanks} for maximum flexibility,
  # for example around the "(" and ")" tests.
  regex="${optional_blanks}${capture}${mandatory_blanks}${capture}${optional_blanks}\(${optional_blanks}${capture}${optional_blanks}\)${optional_blanks}${capture}${optional_blanks}"


  # The regex is applied using the =~ binary operator.
  if [[ $1 =~ $regex ]];
  then
    # We got a match, our capturing groups are saved into bash
    # variables ${BASH_REMATCH[n]}. We'll echo those, but in
    # real use the function would probably copy those values to
    # some global names to be easily used from outside the function.
    echo ${BASH_REMATCH[1]}
    echo ${BASH_REMATCH[2]}
    echo ${BASH_REMATCH[3]}
    echo ${BASH_REMATCH[4]}
  else
    # Oops, input doesn't match.
    echo not matched
  fi
}

# call our function with static input for testing
# purposes.
echo "Test 1 - tab separated fields without extra space"
split "1234     56(78)  90"

# Since we're using [:blank:] and that includes both space and tab
# this also works
echo "Test 2 - space separated fields with lots of meaningless space"
split "1234 56 (    78 )      90       "
Cosmin Prund
  • 25,498
  • 2
  • 60
  • 104
  • Thanks Cosmin Prund, I'll give this script a try and also study it so I understand it :) – btd Jul 19 '11 at 11:53