-1

This is the sample text file:

134781.ux002jupiter!Cat_server8.99123.9.0: Login ****** Vegas     - csv111 - Versio 9.7 13.10.2016
141231.ux002jupiter!Cat_server8.99123.9.0: Logout ****** Madrid     - asd124 - Versio 9.7 13.10.1992
123456.ux002jupiter!Cat_server8.99123.9.0: Login ****** Oslo   - lks485 - Versio 9.7 13.10.1992
132541.ux002jupiter!Cat_server8.99123.9.0: Logout ****** Riyadh   - xcd785 - Versio 9.7 13.10.1992

I want to read this sample file line by line and want to separate keywords from this and maybe store in an array.

The output should be equivalent to (exact formatting doesn't matter):

["134781", "csv111", "Vegas", "Login"]
["141231", "asd124 ", "Vegas", "Logout "]
["121456", "lks485 ", "Vegas", "Login"]
["132541", "xcd785 ", "Vegas", "Logout "]

Please help out.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
Kush
  • 39
  • 8
  • Bash is *really* not a good choice of languages for this job. The output array is in what format, JSONL? Bash doesn't have a compliant JSON generator, so if you had surprising usernames or locations the output could be malformed. – Charles Duffy Nov 25 '19 at 16:26
  • Also, don't flag questions as "urgent". *Everyone's* question is urgent to them, and saying that your own question is more urgent than anyone else's is rude. – Charles Duffy Nov 25 '19 at 16:27
  • No, the output is not JSON, just array of strings. And I HAVE to use shell for this. – Kush Nov 25 '19 at 16:28
  • ...anyhow, the right place to start is to build a regex that matches the fields you want to extract. See in particular the `capture()` example in the [`jq` documentation](https://stedolan.github.io/jq/manual/#RegularexpressionsPCRE). – Charles Duffy Nov 25 '19 at 16:28
  • But what you're showing us *isn't* an array of strings, at least, not in bash's meaning of the word "array". `["134781", "csv111", "Vegas", "Login"]` is not a bash array at all. – Charles Duffy Nov 25 '19 at 16:29
  • Array is not compulsory, extracting is more important. I can store in variables also. – Kush Nov 25 '19 at 16:31
  • Also, what possible contents can the `*****` sections contain? We can't write a regex for them unless we know what they look like when not anonymized. – Charles Duffy Nov 25 '19 at 16:33
  • Nothing ! It is the format. The file is like this. – Kush Nov 25 '19 at 16:36
  • These are just login information, that I need to send to Splunk after extracting those 4 parameters. – Kush Nov 25 '19 at 16:37
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/203043/discussion-between-kush-and-charles-duffy). – Kush Nov 25 '19 at 16:52

1 Answers1

-1

The =~ operator in [[ can be used to store regular expression matches in an array.

re='^([[:digit:]]+)[.][^:]+:[[:space:]]([^[:space:]]+)*[[:space:]*]+([^[:space:]]+)[[:space:]]+-[[:space:]]([^[:space:]]+)[[:space:]]-[[:space:]]'
while IFS= read -r line; do
   [[ $line =~ $re ]] || { echo "WARNING: Could not parse line: $line" >&2; continue; }
   declare -p BASH_REMATCH  # print the array's current content, or put your own code here
done <<'EOF'
134781.ux002jupiter!Cat_server8.99123.9.0: Login ****** Vegas     - csv111 - Versio 9.7 13.10.2016
141231.ux002jupiter!Cat_server8.99123.9.0: Logout ****** Madrid     - asd124 - Versio 9.7 13.10.1992
123456.ux002jupiter!Cat_server8.99123.9.0: Login ****** Oslo   - lks485 - Versio 9.7 13.10.1992
132541.ux002jupiter!Cat_server8.99123.9.0: Logout ****** Riyadh   - xcd785 - Versio 9.7 13.10.1992
EOF

...emits as output:

declare -ar BASH_REMATCH=([0]="134781.ux002jupiter!Cat_server8.99123.9.0: Login ****** Vegas     - csv111 - " [1]="134781" [2]="Login" [3]="Vegas" [4]="csv111")
declare -ar BASH_REMATCH=([0]="141231.ux002jupiter!Cat_server8.99123.9.0: Logout ****** Madrid     - asd124 - " [1]="141231" [2]="Logout" [3]="Madrid" [4]="asd124")
declare -ar BASH_REMATCH=([0]="123456.ux002jupiter!Cat_server8.99123.9.0: Login ****** Oslo   - lks485 - " [1]="123456" [2]="Login" [3]="Oslo" [4]="lks485")
declare -ar BASH_REMATCH=([0]="132541.ux002jupiter!Cat_server8.99123.9.0: Logout ****** Riyadh   - xcd785 - " [1]="132541" [2]="Logout" [3]="Riyadh" [4]="xcd785")

This output is a list of bash array definitions; ${BASH_REMATCH[1]} can thus be used to refer to the numbers at the beginning, ${BASH_REMATCH[2]} to refer to Login vs Logout, etc.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441