3

I need to extract some information from a header file, and I need to get a site name from a string like this:

0008 0080 LO Institution Name                 Site Name Here

The problem is that the site name contains spaces too. The only thing that I came up that works is saving the line as a string and then get the site name as a string after a certain number of characters, like this:

echo ${line:50}

but I'd like something more elegant.

I just noticed that it also removed multiple spaces between Institution Name and Site Name.

user812786
  • 4,302
  • 5
  • 38
  • 50
Renat
  • 223
  • 3
  • 8
  • post a more realistic input string(with real sitename) – RomanPerekhrest Dec 12 '17 at 11:07
  • The spaces are lost because you forgot to quote the value. You want `echo "${string:50}"`. See https://stackoverflow.com/questions/10067266/when-to-wrap-quotes-around-a-shell-variable – tripleee Dec 12 '17 at 11:20
  • With just a single example and no explanation of which part of the string you want, this is unclear. Can you specify which part of the string you want and in what circumstances this is failing? Also, "elegant" isn't really well-defined -- I find it hard to imagine that you would find anything simpler than what you already have. – tripleee Dec 12 '17 at 11:21
  • @tripleee: Thanks for the edit. My first time here, not familiar yet with formatting, etc. I guess by elegant I meant doing it in one line without saving it into a variable first, e.g. pipe it to sed. – Renat Dec 12 '17 at 11:40
  • And you are looking for extracting the part after the long run of spaces? Can you verify that breaking on any occurrence of two spaces is what you really want? – tripleee Dec 12 '17 at 11:47
  • @tripleee: Yes, two or more spaces. Both of your suggested solutions work. – Renat Dec 12 '17 at 12:02

3 Answers3

5

If the question title is representative of your actual problem, and you want to extract the text after multiple adjacent spaces,

echo "${string##*  }"

with two spaces after the asterisk will extract a substring with the longest prefix ending with two spaces removed from the variable's value.

If you need to do this in a pipe, it's easy with sed:

something which produces the output string |
sed 's/.*  //'
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • That will work. The only thing that stays the same between different inputs is the multiple spaces. The rest, like length, beginning of site name, numbers of words before it, can change. I didnt know I can use multiple characters in this construction. Thanks! – Renat Dec 12 '17 at 11:53
  • Hm, I tried `sed` almost like that, only used `^` for start of the string. Didn't work obviously... – Renat Dec 12 '17 at 12:00
  • `^` matches the beginning of a line, but the regex `.*` matches everything from the start of the line anyway. `sed 's/^.* //'` should work just fine just as well. – tripleee Dec 12 '17 at 12:02
  • I forgot the dot. I thought, from the start `^` everything `*` to double spaces `' '` will be replaced with nothing `//`. – Renat Dec 12 '17 at 12:35
  • No, in regex `*` means "the previous expression zero or more times"; and `.` means "any single character (except newline)". – tripleee Dec 12 '17 at 12:36
  • Important distinction. Now that I know what both mean, it should be easier to remember. – Renat Dec 12 '17 at 12:38
0

I think awk would be an optimal choice. It can extract columns easily.

echo '0008 0080 LO Institution Name                 Site Name Here'|awk '{ print $7" "$8 }'

You are able to print whatever columns you want. (And do many other things.)

ntj
  • 171
  • 12
-1

If given string format in not changed every time, following will do the trick.

A="0008 0080 LO Institution Name Site Name Here" echo $A | cut -d " " -f 6

Kalpa Gunarathna
  • 1,047
  • 11
  • 17
  • Please see added comment. There are multiple spaces between "Institution Name" and "Site Name". Even if I use treat multiple delimiters as one option, this would still only get me the first word from the site name. It may have two or more separated by spaces. – Renat Dec 12 '17 at 11:16