3

I have a variable that stores the output of a file. Within that output, I would like to print the first word after Database:. I'm fairly new to regex, but this is what I've tried so far:

sed -n -e 's/^.*Database: //p' "$output"

When I try this, I am getting a sed: can't read prints_output: File name too long error.

Does sed only take in a filename? I am running a hive query to desc formatted table and storing the results in output like so:

output=`hive -S -e "desc formatted table"`

output is then set to the result of that:

...
# Detailed Table Information
Database:               sample_db
Owner:                  sample_owner
CreateTime:             Thu Feb 26 23:36:43 PDT 2015
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               maprfs:/some/location
Table Type:             EXTERNAL_TABLE
Table Parameters:
...
raphnguyen
  • 3,565
  • 18
  • 56
  • 74
  • 1
    The error message doesn't make sense without the code that sets `$output` and the way you're invoking the script. What platform are you running on that 13 characters is too long a file name? Even the early versions of Unix supported 14 characters in a file name. Maybe you should show the result of running `bash -x your-script prints_output` or whatever the command invocation is (the `-x` option shows what Bash thinks it is doing as it does it, more or less). – Jonathan Leffler Feb 27 '15 at 07:33
  • @JonathanLeffler that may be my first mistake. I am trying to pass in a variable to `sed`, not a filename. I've updated the OP to better reflect what I am trying to do. – raphnguyen Feb 27 '15 at 07:39

2 Answers2

4

Superficially, you should be using:

hive -S -e "desc formatted table" |
sed -n -e 's/^.*Database: //p'

This will show the complete line containing Database:. When you've got that working, you can eliminate the unwanted material on the line too.

Alternatively, you could use:

echo "$output" |
sed -n -e 's/^.*Database: //p'

Or, again, given that you're using Bash, you could use:

sed -n -e 's/^.*Database: //p' <<< "$output"

I'd use the first unless you need the whole output preserved for rescanning. Then I'd probably capture the output in a file (with tee):

hive -S -e "desc formatted table" |
tee output.log |
sed -n -e 's/^.*Database: //p'
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • That works perfectly. Still learning the ins and outs of Bash. Why do you prefer preserving the whole output as a file rather than a variable? I will be rescanning the output for additional matches, so assuming that I use `tee` to generate `output.log` like you've suggested, would I then use `sed -n -e 's/^.*next_match //p' output.log` to find my next match? – raphnguyen Feb 27 '15 at 07:54
  • If you saved the data in a file as shown, then the next scan would indeed specify `output.log` as the input file. I use files rather than variables in part because I learned scripting on a machine with a massive 1 MiB of memory and still remember the joy caused when that was doubled to 2 MiB, increasing the available user memory from 256 KiB to 1.25 MiB (fives times)! In other words, memory was not always available for holding large values. It's also easier to debug if the data is captured in a file; you can take a look-see more easily if it is in a file (but you have to clean up afterwards). – Jonathan Leffler Feb 27 '15 at 08:00
  • In other words, you're a seasoned vet. Thanks for the tip and the brief stroll through history. – raphnguyen Feb 27 '15 at 08:01
0

Try using egrep:

egrep -oh 'Database:[[:blank:]][[:alnum:]]*[[:blank:]]' <output_file> | awk  '{print $2;}'
Jørgen R
  • 10,568
  • 7
  • 42
  • 59
  • While this may answer the question it’s always a good idea to put some text in your answer to explain what you're doing. Read [how to write a good answer](http://stackoverflow.com/help/how-to-answer). – Jørgen R Feb 27 '15 at 11:43
  • thanks @jurgemaister – Potter_nsit Oct 27 '16 at 10:15