0

Does bash have a built-in function/capability to split a string per separator symbol? For example, I'd like to split the following strings

10.10.10.1:7000
10.10.10.2:8000
...

in two parts: the first with IP address, the second with the port. I could use sed or awk for this, but I'm curious if bash already has something for this?

Thanks.

UPDATE With a suggestion from @Charles Duffy I came up with the following:

addr="10.10.10.1:7000"
arr=(${addr//:/ })
echo IP ${arr[0]}
echo Port ${arr[1]}
Mark
  • 6,052
  • 8
  • 61
  • 129
  • 1
    The string-splitting approach is overkill for this; I'd just use parameter expansion. – Charles Duffy Jun 21 '22 at 22:28
  • 1
    See https://wiki.bash-hackers.org/syntax/pe – Charles Duffy Jun 21 '22 at 22:28
  • ...whereas the general string-splitting approach is covered in [BashFAQ #1](https://mywiki.wooledge.org/BashFAQ/001). – Charles Duffy Jun 21 '22 at 22:29
  • Can use `cut`: `ipaddr=$( echo $string | cut -d: -f1 )` – Jack Jun 21 '22 at 22:29
  • @Jack, please, no. That's wildly inefficient, and also buggy because of the unquoted expansion. – Charles Duffy Jun 21 '22 at 22:30
  • 2
    `while IFS=: read -r ip port; do echo "$ip" "$port"; done < inputfile` – jordanm Jun 21 '22 at 22:30
  • @Jack, ...see [I just assigned a variable, but `echo $variable` shows something else](https://stackoverflow.com/questions/29378566/i-just-assigned-a-variable-but-echo-variable-shows-something-else) -- that applies to your `echo $string | ...`, before we even talk about how much performance overhead there is to pipelines and command substitutions. – Charles Duffy Jun 21 '22 at 22:30
  • 3
    @jordanm, the caveat with that is if you have `ip:port:somethingelse`, you'll end up with the `:somethingelse` appended to the `port` variable, which is why I generally prefer `IFS=: read -r ip port _` – Charles Duffy Jun 21 '22 at 22:31
  • @CharlesDuffy Don't care. Works 99.9% of the time. And rarely is overhead an issue anymore. – Jack Jun 22 '22 at 13:42
  • 1
    @Jack, write code that iterates over 10,000 files in a maildir and you'll start caring about overhead quickly. And as for the correctness side, the biggest dataloss event I've seen in my career was caused by someone thinking they didn't need to care about quoting. It may be rare for it to have that kind of impact, but losing two weeks of company revenue means that it would have paid for itself for the entire ops staff to take the extra fractions of a second every time they expand a variable pay attention for their entire preceeding careers. – Charles Duffy Jun 22 '22 at 13:54
  • 1
    Sure, you might say "okay, fine, we'll care when it's an argument to `rm` but not when it's an argument to `echo`", but it's a lot easy to get it right in the `rm` cases if you have an ingrained habit from doing it right for every other case. And if you're using `echo` in a pipeline to assign a variable that eventually gets to `rm`... well, there you are. – Charles Duffy Jun 22 '22 at 13:57
  • @Jack, ...anyhow, if you hear people saying "shell is too slow to be used in real-world cases, everything should be written in Python/C/Go/whatever" -- sometimes it _really is_ too slow, sometimes it's just conventionally slow because of people doing crazy inefficient things like `var=$(echo $var | ...)` instead of using builtins. – Charles Duffy Jun 22 '22 at 14:01
  • @CharlesDuffy Sometimes. That 0.1% case. When is was necessary, I wrote C++ member functions in ASM, too. But only when it was necessary. – Jack Jun 22 '22 at 19:59
  • 1
    @Jack Right, but the problem is being able to detect when you're in a 0.1% case. If you write code that's _always_ robust, then you're robust in the cases when it matters for security / data integrity / etc. It's one thing when that's _expensive_, but adding shell quotes takes no time at all (particularly once you've trained your finger memory) and tells the shell to do _less work_! It makes your code more efficient! It's the exact _opposite_ of expensive, making it a clear-cut matter of good practice that every reviewer should always check before code merges to main. – Charles Duffy Jun 22 '22 at 20:02
  • ...especially when the work it's avoiding (of expanding data as glob expressions) is work that's potentially I/O-intensive, prone to causing confidentiality leaks (exposing your filenames to whatever the consumer of your script is), and makes your code's correctness harder to review (as one has to know which files will exist in the current directory to know what runtime behavior will be). – Charles Duffy Jun 22 '22 at 20:06

0 Answers0