4

Hi below is my text file

{"Author":"john"
  "subject":"java"
  "title":"java cook book.pdf"}

{"title":"Php book.pdf"
 "Author":"Smith"
 "subject":"PHP"}

{"Author":"Smith"
"title":"Java book.pdf"}

from the above data i want to extract all titles which contains "java" word, i should get the following output

java cook book.pdf
Java book.pdf

Please suggest me

Thanks

jaypal singh
  • 74,723
  • 23
  • 102
  • 147
user2353439
  • 489
  • 2
  • 7
  • 18

4 Answers4

5

GNU

sed -r '/title.*java/I!d;s/.*:.(.*).}$/\1/' file
java cook book.pdf
Java book.pdf
Community
  • 1
  • 1
Endoro
  • 37,015
  • 8
  • 50
  • 63
3

I will avoid any complex solution and will rely on old good grep+awk+tr instead:

$ grep '"title":' test.txt | grep '[Jj]ava' | awk -F: '{print $2}' | tr -d [\"}]
java cook book.pdf
Java book.pdf

which works as follow:

  1. extract all lines which contain "title":
  2. extract from these lines all which contain either Java or java
  3. split these lines by : and show second field
  4. remove " and } signs
psihodelia
  • 29,566
  • 35
  • 108
  • 157
2

You can try something like this with awk:

awk -F: '$1~/title/&&tolower($2)~/java/{gsub(/\"/,"",$2);print $2}' file

Explaination:

  • -F: sets the field separator to :
  • $1~/title checks where first column is title
  • tolower($2)~/java/ checks for second column java case insensitively
  • gsub(..) is to remove ".
  • print $2 to print your second column
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
0

You should definitely use a json parser to get flawless results.. I like the one provided with PHP and if your file is, as shown, a bunch json blocks separated with blank lines:

foreach( explode("\n\n", file_get_contents('/your/file.json_blocks')) as $js_block ):
    $json = json_decode( trim($js_block) );
    if ( isset( $json['title'] ) && $json['title'] && stripos($json['title'], 'java') ):
        echo trim($json['title']), PHP_EOL;
    endif;
endforeach;

This will be a lot more sure fire than doing the same with any given combination of sed/awk/grep/ et al, simply because json is follows a specific format and should be used with a parser. As an example, a simple new line in the 'title' which has no real meaning to the json but will break the solution provided by Jaypal.. Please see this for a similar problem: parsing xhtml with regex and why you shouldn't do it: RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
smassey
  • 5,875
  • 24
  • 37