Assuming:
- You want to split a line on a left paren
(
.
- The left paren may be preceded and/or followed by a whitespace.
(judging from your input, there are no whitespaces between
aux
and the following left paren)
Then how about an awk
solution:
str="ps aux( sort ( more"
awk -F ' *\\( *' '{ for (i=1; i<=NF; i++) print $i}' <<< "$str"
Output:
ps aux
sort
more
- The
-F
option determines the input field separator.
- The pattern
' *\\( *'
is a regex which matches a left paren
with 0 or more whitespaces before and/or after it.
If my assumption is incorrect, please let me know.
[EDIT]
If you prefer a C
solution, following code will be a help to start:
#include <regex.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int
main(void)
{
regex_t preg;
char *string = "ps aux( sort ( more";
char *pattern = " *\( *"; // regex of the delimiter
char out[256]; // output buffer
int rc;
size_t nmatch = 1;
regmatch_t pmatch[1];
// compile the regex
if (0 != (rc = regcomp(&preg, pattern, 0))) {
printf("regcomp() failed, returning nonzero (%d)\n", rc);
exit(EXIT_FAILURE);
}
// loop while the regex of delimiter is found
while (0 == (rc = regexec(&preg, string, nmatch, pmatch, 0))) {
strncpy(out, string, pmatch[0].rm_so); // copy the substring to print
out[pmatch[0].rm_so] = 0; // terminate the string
printf("%s\n", out);
string += pmatch[0].rm_eo; // seek the pointer to the start of the next token
}
// print the last remaining portion
if (strlen(string) > 0) {
printf("%s\n", string);
}
regfree(&preg);
return 0;
}
[Explanation]
If regexec()
succeeds, it returns the "start position of the matched
substring" in pmatch[0].rm_so
and the "next to end position of the matched
substring" in pmatch[0].rm_eo
as follows:
1st call of regexec()
string: ps aux( sort ( more
^ ^
rm_so rm_eo
We can interpret them as: pmatch[0].rm_so
holds the length of the 1st token
and pmatch[0].rm_eo
indicates the start position of the next token.
Then we update the variables and invoke the 2nd regexec()
:
2nd call of regexec()
string: sort ( more
^ ^
rm_so rm_eo
We repeat the loop until regexec()
returns a non-zero value, meaning
no more match. Then the last token will remain in string
.