how to extract text which matches particular fields in text file using linux commands

Question

Hi below is my text file

{"Author":"john"
  "subject":"java"
  "title":"java cook book.pdf"}

{"title":"Php book.pdf"
 "Author":"Smith"
 "subject":"PHP"}

{"Author":"Smith"
"title":"Java book.pdf"}

from the above data i want to extract all titles which contains "java" word, i should get the following output

java cook book.pdf
Java book.pdf

Please suggest me

Thanks

Do you know what JSON is? – Armen Michaeli Jun 13 '13 at 12:06 — Armen Michaeli, Jun 13 '13 at 12:06

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

5

GNU sed

sed -r '/title.*java/I!d;s/.*:.(.*).}$/\1/' file

java cook book.pdf
Java book.pdf

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 13 '13 at 12:39

Endoro

37,015
8
50
63

score 3 · Answer 2 · answered Nov 11 '13 at 14:13

I will avoid any complex solution and will rely on old good grep+awk+tr instead:

$ grep '"title":' test.txt | grep '[Jj]ava' | awk -F: '{print $2}' | tr -d [\"}]
java cook book.pdf
Java book.pdf

which works as follow:

extract all lines which contain "title":
extract from these lines all which contain either Java or java
split these lines by : and show second field
remove " and } signs

jaypal singh · Answer 3 · 2013-06-13T12:28:30.133

2

You can try something like this with awk:

awk -F: '$1~/title/&&tolower($2)~/java/{gsub(/\"/,"",$2);print $2}' file

Explaination:

-F: sets the field separator to :
$1~/title checks where first column is title
tolower($2)~/java/ checks for second column java case insensitively
gsub(..) is to remove ".
print $2 to print your second column

edited Jun 13 '13 at 12:28

answered Jun 13 '13 at 12:09

jaypal singh

74,723
23
102
147

Thanks for the reply can u please explain me the command so that i can try for other files – user2353439 Jun 13 '13 at 12:12
@user2353439 Added the explaination. – jaypal singh Jun 13 '13 at 12:20

score 0 · Answer 4 · edited May 23 '17 at 11:53

You should definitely use a json parser to get flawless results.. I like the one provided with PHP and if your file is, as shown, a bunch json blocks separated with blank lines:

foreach( explode("\n\n", file_get_contents('/your/file.json_blocks')) as $js_block ):
    $json = json_decode( trim($js_block) );
    if ( isset( $json['title'] ) && $json['title'] && stripos($json['title'], 'java') ):
        echo trim($json['title']), PHP_EOL;
    endif;
endforeach;

This will be a lot more sure fire than doing the same with any given combination of sed/awk/grep/ et al, simply because json is follows a specific format and should be used with a parser. As an example, a simple new line in the 'title' which has no real meaning to the json but will break the solution provided by Jaypal.. Please see this for a similar problem: parsing xhtml with regex and why you shouldn't do it: RegEx match open tags except XHTML self-contained tags

how to extract text which matches particular fields in text file using linux commands

4 Answers4

GNU sed

Explaination:

Linked