0

I have an export from postgresql table with multiple fields, including boolean (exported by postgresql as t and f character), and I need to import it in another database (monetdb) that won't understand t/f as bool values.

(EDIT removed spaces to reflect true file aspect and avoid angry comments - previously there was spaces displayed)

id|val_str|bool_1|bool2|bool_3|bool4|
1|help|t|t|f|t|
2|test|f|t|f|f|
...

As I cannot replace all occurence of t/f I need to integrate the field separator in my pattern. I tried to use awk to replace fields t with TRUE and f with FALSE:

awk -F'|' '{gsub(/\|t\|/, "|TRUE|"); gsub(/\|f\|/, "|FALSE|"); print;}' 

This is working partially, as consecutive fields with a same value (|t|t|) will have only the first occurrence replaced (|TRUE|t| - as 2nd occurence is in fact t| and not |t|).

id|val_str|bool_1|bool2|bool_3|bool4|
1|help|TRUE|t|FALSE|TRUE|
2|test|FALSE|TRUE|FALSE|f|
...

Table has ~450 columns so I can't really specify the list of columns to be replaced, nor work in postgres to 'transform' boolean columns (I could but ...).

I could run the gsub() twice, but I was looking for more elegant way to match the entire field content for all fields.

gsub(/^t$/, ...) is not helping either as we are in the middle of a line most of the time.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
ant1j
  • 305
  • 2
  • 18
  • 1
    Have you considered the possibility to make postgresql output the desired format? I would have expected a flag or some other parameter making it easy, it looks like [there's no such thing but it's still possible](https://stackoverflow.com/questions/28959507/output-yes-no-instead-of-t-f-for-boolean-data-type-in-postgresql). – Aaron Jul 03 '17 at 15:55
  • What does `[Original file has no space]` mean? If the sample input you posted isn't the same format as your real data then, obviously, fix that so it is or you'll get a more complicated solution than necessary or one that just doesn't work and either way will be wasting peoples time. If that's just a completely irrelevant statement then get rid of it. – Ed Morton Jul 03 '17 at 16:50
  • I thought it would be easier to read. – ant1j Jul 03 '17 at 20:59

4 Answers4

3

Table has ~450 columns so I can't really specify the list of columns to be replaced, nor work in postgres to 'transform' boolean columns (I could but ...).

You can let Postgres do the work for you. Basic query to produce the SELECT list:

SELECT string_agg(CASE WHEN atttypid = 'bool'::regtype
                       THEN quote_ident(attname) || '::text'
                       ELSE quote_ident(attname) END, ', ' ORDER BY attnum)
FROM   pg_attribute
WHERE  attrelid = 'mytable'::regclass  -- provide table name here
AND    attnum > 0
AND    NOT attisdropped;

Produces a string of the form:

col1, "CoL 2", bool1::text, "Bool 2"::text

All identifiers are escaped properly. Columns are in default order. Copy and execute it. Use COPY to export to file. (Or \copy in psql.) Performance is about the same as exporting a plain table. If you don't need upper case omit upper().

Why is a simple cast to text enough?

About regclass and escaping identifiers properly:

If you need a complete statement with TRUE / FALSE / NULL in upper case, standard SQL cast notation (without colons ::), still original column names and maybe a schema-qualified tablename:

SELECT 'SELECT '
     || string_agg(CASE WHEN atttypid = 'bool'::regtype
                        THEN format('upper(cast(%1$I AS text)) AS %1$I', attname)
                        ELSE quote_ident(attname) END, ', ' ORDER BY attnum)
     || ' FROM myschema.mytable;'           -- provide table name twice now
FROM   pg_attribute
WHERE  attrelid = 'myschema.mytable'::regclass
AND    attnum > 0
AND    NOT attisdropped;

Produces a complete statement of the form:

SELECT col1, "CoL 2", upper(cast(bool1 AS text) AS bool1, upper(cast("Bool 2" AS text)) AS "Bool 2" FROM myschema.mytable;
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 1
    "I could but ..." => now I can! DBeaver (JDBC client) is complaining about an "unexepected symbol (:)", but with `psql` it is fine, except that I need to add the column name: `THEN 'upper(' || quote_ident(attname) || '::text) AS ' || quote_ident(attname)` to have the right column name. Or remove `upper`. – ant1j Jul 03 '17 at 20:54
  • @ant1j: I wasn't sure you need upper case and / or original column names. As for the short cast notation (`col::text`), you can replace it with SQL standard syntax `cast(col AS text)`. Consider updates above. – Erwin Brandstetter Jul 04 '17 at 03:29
  • short cast notation is fine for DBeaver normally, I don't know why he's complaining on this one... thanks for the update – ant1j Jul 04 '17 at 11:12
1

if perl is okay, you can use lookarounds:

$ cat ip.txt 
id |  val_str  | bool_1 | bool2  | bool_3 | bool4  | 
1  |    help   |   t    |   t    |   f    |   t    |
2  |    test   |   f    |   t    |   f    |   f    | 

$ perl -pe 's/\|\K\h*t\h*(?=\|)/  TRUE  /g; s/\|\K\h*f\h*(?=\|)/  FALSE /g' ip.txt 
id |  val_str  | bool_1 | bool2  | bool_3 | bool4  | 
1  |    help   |  TRUE  |  TRUE  |  FALSE |  TRUE  |
2  |    test   |  FALSE |  TRUE  |  FALSE |  FALSE | 
  • \|\K positive lookbehind to match |
  • \h* optional horizontal space, remove if not actually present in input
  • (?=\|) positive lookahead to match |


Can also use looping with sed. Tested on GNU sed 4.2.2, syntax may vary with other implementations

$ sed ':a s/| *t *|/|  TRUE  |/;ta; :b s/| *f *|/|  FALSE |/;tb' ip.txt 
id |  val_str  | bool_1 | bool2  | bool_3 | bool4  | 
1  |    help   |  TRUE  |  TRUE  |  FALSE |  TRUE  |
2  |    test   |  FALSE |  TRUE  |  FALSE |  FALSE | 
  • :a label
  • s/| *t *|/| TRUE |/ substitute command
  • ta branch to label a as long as substitute command succeeds
  • similarly for :b


With no spaces in input

perl -pe 's/\|\Kt(?=\|)/TRUE/g; s/\|\Kf(?=\|)/FALSE/g' ip.txt 
sed ':a s/|t|/|TRUE|/;ta; :b s/|f|/|FALSE|/;tb' ip.txt 
awk 'BEGIN{FS=OFS="|"} {for(i=1;i<=NF;i++){if($i=="t"){$i="TRUE"} if($i=="f"){$i="FALSE"}} print}' ip.txt
Graham
  • 7,431
  • 18
  • 59
  • 84
Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • source file has no space in fact. Looks good to me. Not familiar with perl but I guess I can send output to pipe STDOUT ? – ant1j Jul 03 '17 at 15:55
  • yes this is just like any other command... output can be piped – Sundeep Jul 03 '17 at 15:56
  • There it is again - `source file has no space in fact.`. Sounds like you're asking us to help you solve a problem that (due to the white space) is harder to solve than the much simpler problem you really have. – Ed Morton Jul 03 '17 at 16:52
  • and now I also have a solution in case of spaces... thanks all – ant1j Jul 03 '17 at 21:06
1

Assuming (based on your comments) that your input file actually doesn't look like the sample you posted but instead looks like this:

$ cat file
id|val_str|bool_1|bool2|bool_3|bool4|
1|help|t|t|f|t|
2|test|f|t|f|f|

then all you need is:

$ awk '{while(gsub(/\|t\|/,"|TRUE|")); while(gsub(/\|f\|/,"|FALSE|"));}1' file
id|val_str|bool_1|bool2|bool_3|bool4|
1|help|TRUE|TRUE|FALSE|TRUE|
2|test|FALSE|TRUE|FALSE|FALSE|

with the general solution for N replacement strings being:

$ awk 'BEGIN{m["f"]="FALSE"; m["t"]="TRUE"} {for (k in m) while(gsub("\\|"k"\\|","|"m[k]"|"));} 1' file
id|val_str|bool_1|bool2|bool_3|bool4|
1|help|TRUE|TRUE|FALSE|TRUE|
2|test|FALSE|TRUE|FALSE|FALSE|
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

Use sed, it`s standard.

sed 's/| *t */| TRUE /g;s/| *f */| FALSE /g'

This tells sed to substitute every substring that begins with a pipe character, an unknown number of spaces (maybe zero), t and a space followed by an unknown number of spaces with | TRUE; same with f.

Pipe the output through column -t if line lengths get messed up.

hidefromkgb
  • 5,834
  • 1
  • 13
  • 44
  • this will not work for all cases... for example: `echo 'abc| t 12| t |' | sed 's/| *t */| TRUE /g'` – Sundeep Jul 03 '17 at 16:01
  • What exactly *is* `t 12`? – hidefromkgb Jul 03 '17 at 16:02
  • 1
    hypothetical column data... your solution doesn't check `|` boundary on both sides – Sundeep Jul 03 '17 at 16:03
  • @Sundeep the only data that can contain spaces here is a string. Strings generally begin with a quote, thus mismatching the regex. And BTW, your solution also doesn\`t check for something like that: `| "HELLO IMMA STRING! | t |" |` – hidefromkgb Jul 03 '17 at 16:06
  • that is true.. if `|` cannot be safely used as delimiter, using regex will be bad idea – Sundeep Jul 03 '17 at 16:07
  • …But really, I don`t think OP\`s case is *that* convoluted. And that made me keep my regex as simple as possible =) – hidefromkgb Jul 03 '17 at 16:10