0

I'm trying to read values from a text file.

I have test1.txt which looks like

sub1    1   2   3
sub8    4   5   6

I want to obtain values '1 2 3' when I specify 'sub1'.

The closest I get is:

subj="sub1"
grep "$subj" test1.txt

But the answer is:

sub8    4   5   6

I've read that grep gives you the next line to the match, so I've tried to change the text file to the following:

test2.txt looks like:

sub1    
1   2   3

sub8    
4   5   6

However, when I type

grep "$subj" test2.txt

The answer is:

sub1

It should be something super simple but I've tried awk, seg, grep,egrep, cat and none is working...I've also read some posts somehow related but none was really helpful

Aserre
  • 4,916
  • 5
  • 33
  • 56
Joane
  • 13
  • 4
  • 3
    See [BashFAQ #1](http://mywiki.wooledge.org/BashFAQ/001) – Charles Duffy Mar 13 '17 at 14:11
  • 1
    grep does not return the next line following the match, it returns the matching line itself. it looks like your original grep result was not returning the expected value, but the second one definitely is. i'd suggest you try echoing the variable contents before executing the grep against test1.txt to be sure the variable is set to what you think it was. – nullrevolution Mar 13 '17 at 14:17
  • @tripleee, I'm not sure that the duplicate is representative of what the OP is trying -- they fell back to a multi-line match based on a misunderstanding, but their first attempted format had the key and the values for retrieval on the same line. – Charles Duffy Mar 14 '17 at 15:04
  • @CharlesDuffy Hmmm, your interpretation seems to match mine? Though if the OP genuinely wants to print the line after the match, we have a duplicate for that as well; e. g. http://stackoverflow.com/questions/17283567/print-specific-number-of-lines-after-matching-pattern – tripleee Mar 14 '17 at 16:53
  • the problem was not the command but the format of the Apple text file format. `while IFS=$' \t\n' read -r -d $'\r' key value1 value2 value3 || [[ $key || $value ]]; do #printf 'Saw key: %q and values: %q\n' "$key" "$values" >&2 if [[ $key = "$target" ]]; then echo "Found values: $value1" echo "Found values: $value2" echo "Found values: $value3" fi done – Joane Mar 14 '17 at 16:54
  • @tripleee, ...okay, I think we've got the actual issue nailed down now. I'm pretty sure that we *do* have a proper duplicate for it somewhere in the knowledgebase, though finding it in the mess of CRLF-related questions might be a bit of a trick. – Charles Duffy Mar 14 '17 at 17:09

7 Answers7

1

Awk works: awk '$1 == "'"$subj"'" { print $2, $3, $4 }' test1.txt

The command outputs fields two, three, and four for all lines in test1.txt where the first field is $subj (i.e.: the contents of the variable named subj).

U. Windl
  • 3,480
  • 26
  • 54
  • Use `-v` to pass arguments to `awk`, rather than performing string substitutions into code -- otherwise, values can escape their quotes and run arbitrary commands. – Charles Duffy Mar 13 '17 at 14:18
  • 1
    For instance, let's say that your `subj` contains `"+system("rm -rf $HOME")+"` -- you don't want that evaluated as `awk` code. – Charles Duffy Mar 13 '17 at 14:18
  • the answer this way is: `sub83` – Joane Mar 13 '17 at 14:38
  • @Charles Duffy: Can you explain where my code fails where `-v var="$subj"` does not? – U. Windl Mar 15 '17 at 07:29
  • @Joane: How did you test? I get `1 2 3`. – U. Windl Mar 15 '17 at 07:31
  • @Toby Speight: I think specific solutions to specific problems are justified. Unless solution for a general problem is asked for, I see little sense to make very general solutions (unless needed). As when using any code, the user should understand how to use the code, and how it works. Blindly pasting some code without understanding it is dangerous. Specifically the claim "I've tried awk, seg, grep,egrep, cat and none is working" was proven to be wrong (unless the user has a completely broken environment). – U. Windl Mar 15 '17 at 07:38
  • Well, your edit has made this a better answer already (+1), so thank you. While it's true the question isn't great, we should always strive to create outstanding answers, regardless! (IMHO) – Toby Speight Mar 15 '17 at 08:37
  • @U.Windl, I already did. See the example value for `subj` with the `system()` call -- your code would run `rm`, the `-v var="$subj"` would treat the string as having its literal contents. – Charles Duffy Mar 15 '17 at 13:22
  • @U.Windl, ...and a thing to keep in mind that StackOverflow isn't principally about answering a question for the person who asked it -- it's about the long tail of people who will find the question later through Google, through it being tagged as a duplicate of something else, etc. Making an answer general lets us serve the long tail. – Charles Duffy Mar 15 '17 at 13:24
  • @U.Windl, ...and to support my answer about what the site is principally about / was founded to do, let me point to its founder's words on the subject: https://stackoverflow.blog/2011/01/05/the-wikipedia-of-long-tail-programming-questions/. Notably, *editing to make a question more general is encouraged*; of course, then, answers should be general enough to work as well. – Charles Duffy Mar 15 '17 at 13:26
  • @Charles Duffy: To be honest `"+system("rm -rf $HOME")+"` is a pattern that breaks inside awk, but setting `var` to `$(rm -rf $HOME)` would also break `-v var="$subj"`. So IMHO it's hard to say which one is better. – U. Windl Mar 16 '17 at 14:48
  • @U.Windl, no, `-v var="$subj"` does *not* expand the value in `subj`; it's substituted literally, so in `awk`, it would be testing for the *literal string* `"+system("rm -rf $HOME")+"` -- which is perfectly correct behavior. – Charles Duffy Mar 16 '17 at 15:41
  • @U.Windl, ...you can test this yourself (tweaked to reduce the level of trust required): `subj='"+system("touch /tmp/insecure")+"'; awk -F: -v subj="$subj" '$1 == subj { print "Matched line with fields: ", $2 }' <<<'"+system("touch /tmp/insecure")+":success'` -- you'll see that it emits `success`, and does **not** create `/tmp/insecure`. – Charles Duffy Mar 16 '17 at 15:44
  • @U.Windl, ...by contrast, doing it your way looks like this: `subj='"+system("touch /tmp/insecure")+"'; awk -F: '$1 == "'"$subj"'" { print "Matched line with fields: ", $2 }' <<<'"+system("touch /tmp/insecure")+":success'` -- which does **not** emit any output, but creates `/tmp/insecure` (or, obviously, runs an `rm` if that's what was there in `subj`). – Charles Duffy Mar 16 '17 at 15:46
  • @Charles Duffy: I clearly got your point, but you seem to miss mine: While your code creates the vulnerability inside awk, my comment created the vulnerability even before awk is started. I wanted to point out that if you don't have control over the pattern provided, you'll have to take great efforts to prevent misuse. Maybe even it's not impossible to prevent misuse at all. But from the original question I got the impression that the user has full control over the pattern to match and the data provided. – U. Windl Mar 20 '17 at 07:17
0

grep "sub1" test1.txt | cut -c6-

or

grep -A 1 "sub1" test2.txt | tail -n 1

jhscheer
  • 342
  • 1
  • 8
  • 18
  • answer to first `sub8 4 5 6` to second: `sub8 2 3` – Joane Mar 13 '17 at 14:34
  • If the results are not `1 2 3` for both my proposals, then your `test1.txt` and `test2.txt` differ from the example content you provided with your question. – jhscheer Mar 13 '17 at 14:39
  • I think it has somethign to do with the delimiters... test1.txt has 2 lines, each column tab separated however when I copy it here: sub1 1 2 3 sub8 4 5 6 – Joane Mar 13 '17 at 14:55
  • At least for the 2nd option I proposed, the delimiters don't matter. All it does is showing you just the line directly following the match. There's something else going on. – jhscheer Mar 13 '17 at 15:03
  • Even when i use `while IFS= read -r line; do printf '%s\n' "$line" done < "$file"` sub8 4 5 6 – Joane Mar 13 '17 at 15:14
0

With your original text file format:

target=sub1
while IFS=$' \t\n' read -r key values; do
  if [[ $key = "$target" ]]; then
    echo "Found values: $values"
  fi
done <test1.txt

This requires no external tools, using only functionality built into bash itself. See BashFAQ #1.


As has come up during debugging in comments, if you have a traditional Apple-format text file (CR newlines only), then you might want something more like:

target=sub1
while IFS=$' \t\n' read -r -d $'\r' key values || [[ $key ]]; do
  if [[ $key = "$target" ]]; then
    echo "Found values: $values"
  fi
done <test1.txt

Alternately, using awk (for a standard UNIX text file):

target="sub1"
awk -v target="$target" '$1 == target { $1 = ""; print; }' <test1.txt

...or, for a file with CR-only newlines:

target="sub1"
tr '\r' '\n' <test1.txt | awk -v target="$target" '$1 == target { $1 = ""; print; }'

This version will be slower if the text file being read is small (since awk, like any other external tool, takes time to start up); but faster if it's large (since awk's operation is much faster than that of bash's built-ins once it's done starting up).

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • the first one did nothing... and the second one doesn't seem to finish the computation.. – Joane Mar 13 '17 at 15:09
  • The second one you need to redirect from the input file. Put ` – Charles Duffy Mar 13 '17 at 15:45
  • @Joane, ...and the first one certainly does work, **if** the input file format is truly what you gave in the question. You can test that with a one-liner: `target=sub1; while read -r key values; do if [[ $key = "$target" ]]; then printf 'Found values:\t%s\n' "$values"; fi; done <<<$'sub1\t1\t2\t3\nsub8\t4\t5\t6\n'` – Charles Duffy Mar 13 '17 at 15:47
  • @Joane, ...to diagnose the case where the input format **isn't** what you think it is, I'd suggest adding a debug line like `printf 'Saw key: %q and values: %q\n' "$key" "$values" >&2` inside the loop before the `if`, which will print content read in an unambiguous format that shows hidden characters (so if your key is really, say, `\rsub1` instead of being just `sub1`, that'll show up). – Charles Duffy Mar 13 '17 at 15:52
  • the answer is: `Saw key: $'sub1\t1\t2\t3\rsub8\t4\t5\t6' and values: ''` – Joane Mar 13 '17 at 16:14
  • @Joane, that implies that your `IFS` isn't at its default value. You could set `IFS=$'\t'` explicitly at the beginning of this line, or `unset IFS` to return it to effective defaults. – Charles Duffy Mar 13 '17 at 16:16
  • @ Charles: thank you so much! you're right, it has something to do with IFS..I still haven't figured out why my txt file has strange hidden characters but that's the problem...I'll try to fix it somehow and open a new post otherwise. – Joane Mar 14 '17 at 15:02
  • @Joane, ...are those characters represented in the `printf '%q'` output above? If so, I can probably identify them. (As an aside -- the most common culprit for surprises is `\r`, a carriage return -- present on newlines for DOS-format text files but not UNIX ones). – Charles Duffy Mar 14 '17 at 15:05
  • @Charles...now i get nothing when trying to use the same commands and the same file. Previously I also got `Saw key: sub1 and values: $'1\t2\t3\rsub8\t4\t5\t6'` for a while, but now it gives no output at all.. I create the txt file copying directly from an excel file...but if there's an easier way to read an excel, I could skip this step.. – Joane Mar 14 '17 at 15:29
  • Ahh -- so this is actually traditional Apple text file format -- only CRs, no LF characters at all. Use `IFS=$' \t\n' read -r -d $'\r' key values`, then. If you want to make lines with no trailing newline work, by the way, you can put a `|| [[ $key || $value ]]` on the end of the `read` command. – Charles Duffy Mar 14 '17 at 15:34
  • (To explain why you're not getting results at all, by the way -- `read` populates its destination variables but returns false when it doesn't find a newline, and the kind of newline it looks for by default is `$'\n'`). – Charles Duffy Mar 14 '17 at 15:39
  • WORKED!!!!! THANK YOU!this was driving me crazy... – Joane Mar 14 '17 at 16:52
0

Sed also works: sed -n -e 's/^'"$subj"' *//p' file1.txt

It outputs all lines matching $subj at the beginning of a line after having removed the matching word and the spaces following. If TABs are used the spaces should be replaced by something like [[:space:]].

U. Windl
  • 3,480
  • 26
  • 54
  • It does nothing... – Joane Mar 13 '17 at 14:35
  • @Joane: How did you test it? – U. Windl Mar 15 '17 at 07:24
  • @Toby Speight: Others also didn't explain their solutions, and I thought it is quite straight forward. I added an explanation. – U. Windl Mar 15 '17 at 07:26
  • It might be worth noting that `$subj` here is a regular expression, not a literal string. In the general case, you might need to use Bash parameter substitution to transform regexp metacharacters with lines like `subj="${subj//\\/\\\\}"; ${subj//\*/\\*}"` etc. – Toby Speight Mar 15 '17 at 08:44
  • @Toby Speight: If I start preparing the target expression with BASH's regular expression substitution, I don't need to use sed: Then BASH can handle the rest as well. – U. Windl Mar 15 '17 at 11:54
0

There's a bunch of ways to do this (and shorter/more efficient answers than what I'm giving you), but I'm assuming you're a beginner at bash, and therefore I'll give you something that's easy to understand:

egrep "^$subj\>" file.txt | sed "s/^\S*\>\s*//"

or

egrep "^$subj\>" file.txt | sed "s/^[^[:blank:]]*\>[[:blank:]]*//"

The first part, egrep, will search for you subject at the beginning of the line in file.txt (that's what the ^ symbol does in the grep string). It also is looking for a whole word (the \> is looking for an end of word boundary -- that way sub1 doesn't match sub12 in the file.) Notice you have to use egrep to get the \>, as grep by default doesn't recognize that escape sequence. Once done finding the lines, egrep then passes it's output to sed, which will strip the first word and trailing whitespace off of each line. Again, the ^ symbol in the sed command, specifies it should only match at the beginning of the line. The \S* tells it to read as many non-whitespace characters as it can. Then the \s* tells sed to gobble up as many whitespace as it can. sed then replaces everything it matched with nothing, leaving the other stuff behind.

BTW, there's a help page in Stack overflow that tells you how to format your questions (I'm guessing that was the reason you got a downvote).

-------------- EDIT ---------

As pointed out, if you are on a Mac or something like that you have to use [:alnum:] instead of \S, and [:blank:] instead of \s in your sed expression (as these are portable to all platforms)

blackghost
  • 1,730
  • 11
  • 24
  • `$(subj)` is running `subj` as a command and substituting its output, not substituting the value of a *variable* named `subj` – Charles Duffy Mar 13 '17 at 14:23
  • 1
    Also, `\S` and `\s` are PCRE; normal `sed` only supports BRE (by default) or ERE (with some flags to enable it). – Charles Duffy Mar 13 '17 at 14:24
  • Sorry, my bad (to used to make,..) fixing. – blackghost Mar 13 '17 at 14:25
  • answer still `sub8 4 5 6` – Joane Mar 13 '17 at 15:04
  • BTW, it's actually `[[:blank:]]` and `[[:alnum:]]` -- though you could use `[^[:blank:]]` to means "anything *but* a blank character". Those sequences are only valid inside a character set definition, which `[]` enters. – Charles Duffy Mar 13 '17 at 15:50
  • @Joane I just tried it, and it works for me... I'm updating my answer to use the portable syntax as @Charles is suggesting -- perhaps you don't have something compatible with the `\S` or `\s` – blackghost Mar 13 '17 at 16:05
0

You doing it right, but it seems like test1.txt has a wrong value in it.

with grep foo you get all lines with foo in it. use grep -m1 foo to find the first line with foo in it only.

then you can use cut -d" " -f2- to get all the values behind foo, while seperated by empty spaces.

In the end the command would look like this ...

$ subj="sub1"
$ grep -m1 "$subj" test1.txt | cut -d" " -f2-

But this doenst explain why you could not find sub1 in the first place. Did you read the proper file ?

Mario
  • 679
  • 6
  • 10
  • answer still `sub8 4 5 6` – Joane Mar 13 '17 at 14:33
  • The error isnt in the command line, I think its in the file itself, What system you using ? – Mario Mar 13 '17 at 14:39
  • Mac Sierra 10.12.3 – Joane Mar 13 '17 at 15:01
  • Did you change `$IFS` or you need to change `$IFS` to use a proper language ? by edfault `utf8` is used, but Apple has its own language and region preferences. I think changeing the `$IFS` or system-language will solve the problem. – Mario Mar 13 '17 at 15:34
  • I'm a beginner in mac, could you specify a bit more how to do it? Indeed, something like that is happening `for i in 'cat test1.txt'; do echo "$i"; done` replies `sub8 4 5 6`... but if I write `IFS=$'\n' for i in 'cat test1.txt'; do echo "$i"; done` I still get the same.. – Joane Mar 13 '17 at 15:59
  • I am not a apple user and can not tell what there will be used for word wrap. sorry, wish I could help ya. better to start a new post, because this is a complete different category . – Mario Mar 13 '17 at 16:33
0
awk '/sub1/{ print $2,$3,$4 }' file
1 2 3

What happens? After regexp /sub1/ the three following fields are printed. Any drawbacks? It affects the space.

Claes Wikner
  • 1,457
  • 1
  • 9
  • 8
  • I'm writing as a reviewer because your answer has been flagged for its quality. Would you kindly consider adding some commentary to your answer that reveals how it works. – Bill Bell Mar 13 '17 at 19:34
  • the answer I get... `awk '/sub1/{ print $2,$3,$4 }' test1.txt` `sub83` – Joane Mar 14 '17 at 09:56
  • That is a phenomenon I don't understand. But what will take place if you text awk '/sub1/{$1="";print $2,$3,$4}' file? – Claes Wikner Mar 14 '17 at 12:04
  • but I'd have to specify the file, right? otherwise I get the: `awk '/sub1/{ print $2,$3,$4 }' file` `awk: can't open file file` – Joane Mar 14 '17 at 12:39
  • Yes, you have to specify your file. – Claes Wikner Mar 14 '17 at 15:43