1

I have the following string and I want to split it into 3 parts:

Text:

<http://rdf.freebase.com/ns/american_football.football_player.footballdb_id>    <http://www.w3.org/2000/01/rdf-schema#label>    "footballdb ID"@en

Output should be

$1 = <http://rdf.freebase.com/ns/american_football.football_player.footballdb_id>
$2 = <http://www.w3.org/2000/01/rdf-schema#label>
$3 = "footballdb ID"@en

basically an splitting a RDF'sh tuple into its parts. I want to do this via a UNIX script , but I do not know sed or awk. Please help.

Soumitra
  • 604
  • 1
  • 8
  • 20
  • Is the white space between your input fields a tab char or something else? What do you mean by `Output`? Do you literally mean you want a string that says `$3 = "footballdb ID"@en` printed by your awk command? If not please clarify... – Ed Morton Sep 27 '14 at 16:38
  • FWIW, your first line says you "want to split it into 4 parts", but your output example has 3 parts. I assume that the URLs are always proper URLs so they won't contain spaces, hence only the last field can contain spaces. Is that correct? – PM 2Ring Sep 27 '14 at 20:18
  • Yes,Only the last column can have space. – Soumitra Sep 27 '14 at 23:46

5 Answers5

3

If your input fields are tab-separated, this will produce your posted desired output:

$ awk -F'\t' '{ for (i=1;i<=NF;i++) printf "$%d = %s\n", i, $i }' file
$1 = <http://rdf.freebase.com/ns/american_football.football_player.footballdb_id>
$2 = <http://www.w3.org/2000/01/rdf-schema#label>
$3 = "footballdb ID"@en

Alternatively this might be what you want if your fields are not tab-separated:

$ cat tst.awk
{
    gsub(/<[^>]+>/,"&\n")
    split($0,a,/[[:space:]]*\n[[:space:]]*/)
    for (i=1; i in a; i++)
        printf "$%d = %s\n", i, a[i]
}
$
$ awk -f tst.awk file
$1 = <http://rdf.freebase.com/ns/american_football.football_player.footballdb_id>
$2 = <http://www.w3.org/2000/01/rdf-schema#label>
$3 = "footballdb ID"@en

If that's not how your input fields are separated and/or not what you want output, update your question to clarify.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2
read A B C <<< $string
echo -e "\$1 = $A\n\$2 = $B\n\$3 = $C" 

Output:

$1 = <http://rdf.freebase.com/ns/american_football.football_player.footballdb_id>
$2 = <http://www.w3.org/2000/01/rdf-schema#label>
$3 = "footballdb ID"@en
Cyrus
  • 84,225
  • 14
  • 89
  • 153
1

Whatever you use to split the string needs to recognize not only the white space but also the convention that the double quote "protects" the blank space before ID and prevents it from splitting the fields. I fear this computation may be beyond what is possible with sed. You could do it in awk, but awk provides little special advantage here.

You show a space-separated format with quotes. A similar problem is to parse comma-separated format with quotes. Related questions:

Community
  • 1
  • 1
Norman Ramsey
  • 198,648
  • 61
  • 360
  • 533
-1

echo "your string" |awk -F" " '{ print $1 $2 $3 $4}'

Romeo Kienzler
  • 3,373
  • 3
  • 36
  • 58
-2
awk '{ print "$1 = " $1 "\n$2 = " $2 "\n$3 = " $3 }'  filename
Remi Guan
  • 21,506
  • 17
  • 64
  • 87
Newbie
  • 1