0

What

I'd like to turn the iana backward timezones file into a json file with unique keys, but in order to do that I'll have to make sure that the keys become the values and vice versa.

That's because a json file can't have duplicate keys.

Example:

That file contains lot's of duplicate links, but for this example let's use these two:

Link    America/Toronto     America/Montreal
Link    America/Toronto     Canada/Eastern

I want those to turn into:

"America/Montreal": "America/Toronto", "Canada/Eastern": "America/Toronto",

so that they both output Toronto.

What I've tried so far:

The regular expression I've made so far is this:

  • search for: ^Link[\s]*([a-zA-Z\/\-]*)[\s]*([a-zA-Z\/\-]*)$
  • replace with: "\2" : "\1",

Finally I tried doing this with sed like so: sed -E 's|^Link[\s]*([a-zA-Z\/\-]*)[\s]*([a-zA-Z\/\-]*)$|"\2" : "\1"|' ./backward

but for some reason it keeps outputting the whole file without substituting anything.

What am I doing wrong?

Community
  • 1
  • 1
SudoPlz
  • 20,996
  • 12
  • 82
  • 123
  • I think that your command is almost correct. So for the regex, how about modifying from ``[\s]`` to ``\s``? It's ``sed -E 's|^Link\s*([a-zA-Z\/\-]*)\s*([a-zA-Z\/\-]*)$|"\2" : "\1"|' ./backward``. If this was not what you want, I'm sorry. – Tanaike Aug 28 '18 at 23:55

3 Answers3

1

I strongly suggest using jq, a tool built with JSON in mind (which thus -- unlike sed -- is incapable of generating output which is not valid JSON, unless explicitly directed to).

The below is written to favor readability over terseness:

input='
Link    America/Toronto     America/Montreal
Link    America/Toronto     Canada/Eastern
'

# -R == raw input; -n == don't consume input until directed by "input" or "inputs"
jq -Rn '
# start by creating an array of smaller arrays, one per line
[inputs
 | select((. | length) > 1)    ## ignore empty lines
 | split("[[:space:]]+"; "")   ## Split on runs of whitespace
 | select(.[0] == "Link")]     ## Ignore anywhere first column is not "Link"
# then combine those smaller arrays to create key/value pairs in one big object
| reduce .[] as $item ({}; .[$item[2]]=$item[1])
' <<<"$input"

...properly emits:

{
  "America/Montreal": "America/Toronto",
  "Canada/Eastern": "America/Toronto"
}

...as you can see at https://jqplay.org/s/RBBKMUS2pv


Alternately, that same logic written in Python (wrapped for invocation from shell):

# capture your Python code in a variable via a quoted heredoc
# this lets it be included in your shell script as a literal
link2json_py=$(cat <<'EOF'
import json, sys

data = {}
for line in sys.stdin:
    line = line.rstrip()
    columns = line.split()
    if len(columns) < 3:
        continue
    if columns[0] != 'Link':
        continue
    data[columns[1]] = columns[2]
json.dump(data, sys.stdout)
sys.stdout.write('\n')
EOF
)

# define a shell function wrapping that Python code
link2json() {
  python -c "$link2json_py" "$@"
}

# and call that shell function
link2json <<<"$input"
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • What a great answer. This looks promising, but I want to be able to use the same command in multiple work stations without installing any 3rd party tools – SudoPlz Aug 29 '18 at 13:04
  • I'd consider Python, then -- available everywhere, with a JSON generation library built in. – Charles Duffy Aug 29 '18 at 13:57
  • Using Python is more portable than relying on proprietary (vendor-specific) extensions to `sed`, at least. :) – Charles Duffy Aug 29 '18 at 14:03
1

I assume you are using GNU sed. Your problem comes from specificities of GNU extended regular expressions that are, unfortunately, not very well documented. From Regular-Expressions.info, for instance:

The shorthand classes \w, \W, \s and \S can be used instead of [[:alnum:]_], [^[:alnum:]_], [[:space:]] and [^[:space:]]. You can use these directly in the regex, but not inside bracket expressions. A backslash inside a bracket expression is always a literal.

So, you cannot use the \s shorthand for [:space:] inside a [...] set definition. As noted by Tanaike you do not need set definitions and:

sed -E 's|^Link\s*([a-zA-Z\/\-]*)\s*([a-zA-Z\/\-]*)$|"\2" : "\1"|' ./backward

should work. If, for any reason, you want to use set definitions,

sed -E 's|^Link[[:space:]]*([a-zA-Z\/\-]*)[[:space:]]*([a-zA-Z\/\-]*)$|"\2" : "\1"|' ./backward

should also work. Note that:

 sed -E 's|^Link\s+([a-zA-Z\/\-]+)\s+([a-zA-Z\/\-]+)$|"\2" : "\1"|' ./backward

is probably better. And:

 sed -E 's|^Link\s+([[:alpha:]/-]*)\s+([[:alpha:]/-]*)$|"\2" : "\1"|' ./backward

even better.

Renaud Pacalet
  • 25,260
  • 3
  • 34
  • 51
  • Solution 1, 3 and 4 does not work. Solution 2 is the only one that works! I just tried all of the above commands on OSX – SudoPlz Aug 29 '18 at 13:02
  • The OP is on MacOS, so GNU sed seems unlikely (unless they explicitly installed it). – Charles Duffy Aug 29 '18 at 14:07
  • @SudoPlz: Glad to read that at least one solution worked. If you want to use the GNU utilities instead of the default ones you can install them with [MacPorts](https://www.macports.org/) or [Homebrew](https://brew.sh/). I am currently on macOS High Sierra and I use the GNU sed port by MacPorts. All solutions I show have been tested under this environment. – Renaud Pacalet Aug 29 '18 at 14:25
  • Got it, thank you so much @RenaudPacalet, but for now I think I'll use the 2nd just so that I don't have to install anything externally. – SudoPlz Aug 29 '18 at 15:02
0

Solution:

The answer to my question the solution is the following command:

sed -En 's|^Link[[:space:]]*([^[:space:]]*)[[:space:]]*([^[:space:]]*)$| "\2" : "\1"|p' ./backward

It works as expected and creates the body of the JSON output

TL/DR:

Specifically Renaud's answer made me realise that I have to use [[:space:]] instead of [/s].

After running his command I was left with a couple of unwanted lines:

A) comments that the file contains on top

i.e # This file is...

(That was resolved by telling sed to not print lines that don't match (found that here) by adding the -n flag in the beginning and the p flag in the end of the script) and

B) some lines that were not converted

i.e Link Pacific/Pago_Pago Pacific/Samoa

(That was resolved by telling sed to match anything that is not a space in the group [^[:space:]])

Finally the whole script:

looks like this:

#!/bin/bash
echo "{";
sed -En 's|^Link[[:space:]]*([^[:space:]]*)[[:space:]]*([^[:space:]]*)$|    "\2": "\1"|p' ./backward
echo "}";

And running the script like so: sh index.sh > timezones.json outputs a beautiful json file.

SudoPlz
  • 20,996
  • 12
  • 82
  • 123