0

I have a file of about 150 lines, where each line is part of a URL. I wanted to extract 4 different parameters from each of the lines and put them into a file. Something like:

/secure/domain/new.aspx?id=620&utm_source=1034&utm_medium=cpc&utm_term=term1&try=1&v=3&utm_account=account_name&utm_campaign=campaign_name&utm_adgroup=adgroup&keyword=keyword1&pkw=pkw1&idimp=id&premt=premt1&gclid=id

As a trial, I did

awk '/pkw/,/&idimp/' file > output.txt

thinking that this would atleast get me value1, but it just returned the input file as is. What am I doing wrong? Also, how to make it return all four values? I'm looking to get keyword, pkw, idimp and premt.

Edit: The expected output is a file containing the 4 values for each of the 150 lines in the input file. So

 keyword pkw1 idi premt1

Even if I just get the 4 values in 4 different files, it would suffice.

CodingInCircles
  • 2,565
  • 11
  • 59
  • 84
  • That will print the entirety of any line that falls between a line containing the string `param1` and a line containing the string `param2`. You need an action statement which does something different than printing the entire line if you want just parts of a line. You likely also want to only match on lines that contain the params you want (and not a range of lines). – Etan Reisner Jan 06 '14 at 19:27
  • 1
    What is your expected output? – anubhava Jan 06 '14 at 19:31

3 Answers3

1

You can use this awk:

awk -F'[=&]' '{print $2, $4, $6, $8}' file
value1 value2 value3 value4

To redirect the output to a file:

awk -F'[=&]' '{print $2, $4, $6, $8}' file > output.txt

EDIT: Based on your edited question you can use:

awk -F'[=&]' '{n=1; for (i=1; i<=NF; i++) {if ($i=="interested") {n=i+3; break}}
      for (i=0; i<8; i+=2) printf $(n+i) " "; print ""}' file
value1 value2 value3 value4 
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • I would avoid the `+`. If there's an empty value `param3=&param4=x` you'll see "corrupted" output. – glenn jackman Jan 06 '14 at 19:37
  • Thanks I edited but I had `'[=&]+'` for the case if 2 `&&` appear by any chance. – anubhava Jan 06 '14 at 19:43
  • Thanks for the answer. I have edited the question. Can you please edit the answer accordingly? I tried changing the `[=&]` to include some pattern matching, but didn't work out as expected. – CodingInCircles Jan 06 '14 at 19:45
  • It's not getting me the exact output. Let me change the question to reflect what the closest URL is, as I can't share the exact URL. – CodingInCircles Jan 06 '14 at 20:01
1
s='/helloworld/some/other/standard/URL/mumbo/jumbo/page.aspx?strings&that&I&am&not&interested&in&param1=value1&param2=value2&param3=value3&param4=value4&some&more&uninteresting&strings'
echo "$s" | grep -o 'param[1234]=[^&]*' | cut -d= -f2- | paste -d " " - - - -
value1 value2 value3 value4

Keeping up with the clarifications to the question:

s='/secure/domain/new.aspx?id=620&utm_source=1034&utm_medium=cpc&utm_term=term1&try=1&v=3&utm_account=account_name&utm_campaign=campaign_name&utm_adgroup=adgroup&keyword=keyword&pkw=pkw1&idimp=id&premt=premt1&gclid=id'
echo "$s" |  grep -o '\<\(keyword\|pkw\|idimp\|premt\)=[^&]*' | cut -d= -f2- | paste -d " " - - - -
keyword pkw1 id premt1

The \< is a "start of word" anchor to avoid matching parameters like "fookeyword"

With awk, I'd write:

awk -F '[?=&]' '
    BEGIN {
        # initialize the parameters you want
        p["keyword"] = p["pkw"] = p["idimp"] = p["premt"] = 1
    } 
    {
        for (i=2; i<NF; i+=2) 
            if ($i in p) 
                printf "%s ", $(i+1)
        print ""
    }
'
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
  • Perfect! Worked on the first attempt! :) – CodingInCircles Jan 06 '14 at 21:07
  • Hey glenn, I got the output yesterday and had to do some more tr-ing and sed-ing to get it exactly the way I wanted it. While I didn't expect it to be exactly easy, I assume there's an easier way. So, I got the 4 keywords newline separated, with an extra newline at the end of each group of 4. Any way that could have been tab separated and newline separated, or comma and newline separated? – CodingInCircles Jan 08 '14 at 00:53
  • 1
    with `paste -d " " - - - -` the -d option defines the separator between the fields. If you want a comma use `-d ,`. If you want a tab, omit that option since tab is paste's default separator – glenn jackman Jan 08 '14 at 02:34
0

Or just grep -P, but that probably requires installing GNU grep.

grep -oP '[?&][^&?=]+=\K[^&?]+'
tripleee
  • 175,061
  • 34
  • 275
  • 318