Removing a line, plus drop the comma of the last line with a matching pattern

Question

DROP TABLE IF EXISTS `qalnk`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `qalnk` (
  `id` bigint(20) NOT NULL,
  `answer_id` bigint(20) DEFAULT NULL,
  `date_deleted` bigint(20) DEFAULT NULL,
  `deleted_by_user_ap_id` varchar(36) DEFAULT NULL,
  `expression_id` bigint(20) DEFAULT NULL,
  `expression_type` varchar(255) DEFAULT NULL,
  `ordering` int(11) DEFAULT NULL,
  `question_id` bigint(20) DEFAULT NULL,
  `expression_for_deselect_id` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FK6661B19F393DFCD` (`expression_id`),
  KEY `FK6661B195182DDCD` (`question_id`),
  KEY `FK6661B195742A56B` (`expression_for_deselect_id`),
  KEY `idx_qlnk_nswrd` (`answer_id`),
  KEY `FK6661B19126D878D` (`answer_id`),
  KEY `FK6661B1975B33071` (`id`),
  CONSTRAINT `FK6661B19126D878D` FOREIGN KEY (`answer_id`) REFERENCES `ans` (`id`),
  CONSTRAINT `FK6661B1975B33071` FOREIGN KEY (`id`) REFERENCES `apobj` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

My goal is to drop all constraints from a database, so I would like to have simple code search for the word "CONSTRAINT" and drop the line

I tried to use sed

sed '/\s*CONSTRAINT/d' ~/Downloads/dump.sql > ~/ouput.sql

but there are all these tailing comma that are left behind because of CONSTRAINTS being the last statement. I don't mind if it awk, sed, or some common tools.

The desired output is

DROP TABLE IF EXISTS `qalnk`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `qalnk` (
  `id` bigint(20) NOT NULL,
  `answer_id` bigint(20) DEFAULT NULL,
  `date_deleted` bigint(20) DEFAULT NULL,
  `deleted_by_user_ap_id` varchar(36) DEFAULT NULL,
  `expression_id` bigint(20) DEFAULT NULL,
  `expression_type` varchar(255) DEFAULT NULL,
  `ordering` int(11) DEFAULT NULL,
  `question_id` bigint(20) DEFAULT NULL,
  `expression_for_deselect_id` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FK6661B19F393DFCD` (`expression_id`),
  KEY `FK6661B195182DDCD` (`question_id`),
  KEY `FK6661B195742A56B` (`expression_for_deselect_id`),
  KEY `idx_qlnk_nswrd` (`answer_id`),
  KEY `FK6661B19126D878D` (`answer_id`),
  KEY `FK6661B1975B33071` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

Little modification of yours solution: sed '/CONSTRAINT/d' ~/Downloads/dump.sql > ~/ouput.sql — josifoski, Dec 19 '14 at 21:36
@josifoski: While your suggestion is truly the simpler (non-redundant) _equivalent_ of the OP's `sed` command, the better alternative in this context is probably `sed '/^\s*CONSTRAINT /d'` - i.e., to _anchor_ the expression at the start of a line (note that the use of `\s` assumes _GNU_ `sed`; with _mandatory_ preceding whitespace you'd need `sed '/^\s\+CONSTRAINT /d'`). — mklement0, Dec 19 '14 at 23:23

score 2 · Accepted Answer · edited Dec 20 '14 at 19:40

2

sed is an excellent tool for simple substitutions on a single line but for anything else just use awk. Here's GNU awk for multi-char RS and the \s abbreviation for [[:space:]]:

$ awk -v RS='^$' -v ORS= '{gsub(/,\s*CONSTRAINT[^\n,]+/,"")}1' file
DROP TABLE IF EXISTS `qalnk`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `qalnk` (
  `id` bigint(20) NOT NULL,
  `answer_id` bigint(20) DEFAULT NULL,
  `date_deleted` bigint(20) DEFAULT NULL,
  `deleted_by_user_ap_id` varchar(36) DEFAULT NULL,
  `expression_id` bigint(20) DEFAULT NULL,
  `expression_type` varchar(255) DEFAULT NULL,
  `ordering` int(11) DEFAULT NULL,
  `question_id` bigint(20) DEFAULT NULL,
  `expression_for_deselect_id` bigint(20) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FK6661B19F393DFCD` (`expression_id`),
  KEY `FK6661B195182DDCD` (`question_id`),
  KEY `FK6661B195742A56B` (`expression_for_deselect_id`),
  KEY `idx_qlnk_nswrd` (`answer_id`),
  KEY `FK6661B19126D878D` (`answer_id`),
  KEY `FK6661B1975B33071` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

For a POSIX-only solution (see comments below from @mklement0):

awk -v RS=$(printf '\3') -v ORS= '{gsub(/,[[:space:]]*CONSTRAINT[^\n,]+/,"")}1'

edited Dec 20 '14 at 19:40

mklement0

382,024
64
607
775

answered Dec 20 '14 at 03:52

Ed Morton

188,023
17
78
185

Nice; that is indeed simpler than the `sed` solutions; however, to make it POSIX-compliant and thus make it work on BWK `awk` (since we now know the OP is on OSX), use `awk -v RS= '{gsub(/,[[:space:]]*CONSTRAINT[^\n,]+/,"")}1'` - i.e., don't use a regex as `RS` (rely on an _empty_ `RS` treating blocks of contiguous lines as a single record), and use `[[:space:]]` in lieu of `\s`. – mklement0 Dec 20 '14 at 04:55
1

yeah, that'll work if there's no blank lines within blocks and the OP doesn't mind multiple blank lines getting truncated. The typical way to do this in non-gawk is just to do `{rec = (rec ? rec ORS : "") $0} END{gsub(...,rec); print rec}'` so you build up the record one line at a time as you read it and then perform the op you want in the END section. You can also just use come control-char that you know can't exist in the input file as the RS. `BEGIN{RS=SUBSEP}` is often an acceptable approach. – Ed Morton Dec 20 '14 at 07:00
Thanks for the tips. The control-char-as-record-sep. approach is probably simplest, so we get (I've chosen `0x3`): `awk -v RS=$(printf '\3') '{gsub(/,[[:space:]]*CONSTRAINT[^\n,]+/,"")}1'` file (In bash, ksh, and zsh, the simpler `-v RS=$'\3'` is also an option.) As for building up the buffer line by line: to mimic what `-v RS=^$` does - namely to read the input as is, _including_ a trailing `\n` - you can use the simpler `{ rec = rec $0 ORS }`. To _not_ include the trailing `\n`, it's probably better to use `{ rec = (NR == 1 ? "" : rec ORS) $0 }`, otherwise you'll lose empty leading lines. – mklement0 Dec 20 '14 at 19:02
Finally, a nitpick: because `-v RS=^$` (or the POSIX equivalent of choosing a control char. not contained in the input) reads a trailing newline as well, simply outputting with `1` (an implied `print`) will add an _extra_ newline. Thus, a POSIX-compliant command that most faithfully preserves the input file would be: `awk -v RS=$(printf '\3') '{gsub(/,[[:space:]]*CONSTRAINT[^\n,]+/,""); printf "%s", $0}'` – mklement0 Dec 20 '14 at 19:02
1

The common approach when using `-v RS='^$'` or equivalent is just to specify `-v ORS=` so nothing is added by a print: `awk -v RS=$(printf '\3') -v ORS= '{gsub(/,[[:space:]]*CONSTRAINT[^\n,]+/,"")}1'` – Ed Morton Dec 20 '14 at 19:05
Cool, that's a neat trick. I think we now have a robust, cross-platform solution. Would you mind adding it to the answer? – mklement0 Dec 20 '14 at 19:09
1

Done. I don't see it as a trick though, it's just clearly specifying what you want to be output at the end of each record - the default (where you strip newlines when reading the input courtesy of `RS='\n'`) is that you want to add the newline back when printing the output but in this case you're removing nothing from the record when reading it so you want to add nothing when printing it. – Ed Morton Dec 20 '14 at 19:12
1

Thanks, and thanks for teaching me a thing or two. You're right: it's not a trick, but its simplicity made me call it that. If we apply the same approach to the collect-all-lines-one-by-one-approach, we get: `awk -v ORS= '{ rec = rec $0 RS } END { print rec }'` (note that I've switched the string concatenation from `ORS` to `RS`). – mklement0 Dec 20 '14 at 19:31

score 0 · Answer 2 · answered Dec 19 '14 at 21:38

It's a bit of a hack - use (GNU) awk to shift trailing ,s to the start of subsequent lines first followed by a sed delete.

awk -v RS= '{gsub(/,\n/, "\n,"); print}'  ~/Downloads/dump.sql | 
sed '/\s*CONSTRAINT/d' > ~/ouput.sql

This gives me the following, which should be valid SQL

DROP TABLE IF EXISTS `qalnk`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `qalnk` (
  `id` bigint(20) NOT NULL
,  `answer_id` bigint(20) DEFAULT NULL
,  `date_deleted` bigint(20) DEFAULT NULL
,  `deleted_by_user_ap_id` varchar(36) DEFAULT NULL
,  `expression_id` bigint(20) DEFAULT NULL
,  `expression_type` varchar(255) DEFAULT NULL
,  `ordering` int(11) DEFAULT NULL
,  `question_id` bigint(20) DEFAULT NULL
,  `expression_for_deselect_id` bigint(20) DEFAULT NULL
,  PRIMARY KEY (`id`)
,  KEY `FK6661B19F393DFCD` (`expression_id`)
,  KEY `FK6661B195182DDCD` (`question_id`)
,  KEY `FK6661B195742A56B` (`expression_for_deselect_id`)
,  KEY `idx_qlnk_nswrd` (`answer_id`)
,  KEY `FK6661B19126D878D` (`answer_id`)
,  KEY `FK6661B1975B33071` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = @saved_cs_client */;

mklement0 · Answer 3 · 2019-07-19T17:46:45.547

^{Despite the down-votes, I think this answer offers effective solutions that work as advertised, while (hopefully) also being informative. Do tell us if it is not, so I can fix it.}

A single-pass sed solution requires that all input lines be read at once, similar to Ed Morton's helpful awk answer:

A GNU sed solution:

sed -zr 's/,\n\s*CONSTRAINT\s+[^\n,]+//g' file

-z uses NUL (null bytes) as the input line separator, and since the are no embedded NULs in the input, the entire contents of file is read into the pattern space at once.
-r enables extended regular expressions (modern syntax, extended features).
The regex removes all CONSTRAINT lines, including the ,\n from the preceding line, which leaves the syntax of the enclosing CREATE TABLE statement intact.

A BSD (macOS) sed solution is, unfortunately, much more cumbersome:

The BSD sed version is missing many of GNU's (nonstandard) convenience features, which makes the solution more painful. BSD sed offers only a few extensions to the POSIX standard, but a notable one is the ability to use so-called extended regular expressions.

sed -E ':a
$!{N;ba
}
s/,\n([[:blank:]]*CONSTRAINT[[:blank:]]+[[:print:][:blank:]]+\n)+/\
/g' file

-E - similar to GNU sed's -r - enables extended regular expressions.
:a\n$!{N;b\na} is a common sed idiom that reads the entire input at once:
- :a defines a label to jump to.
- $! matches every line but (!) the last ($)
- {N;ba}; reads the next line into the pattern space (input buffer to operate on), then branches (b) to label a (:a).
- In other words: This reads all lines into the pattern space, which is what the subsequent commands then operate on (s, in this case).
Note the required newlines, both to terminate branch labels and branching commands, and (in escaped form) in the replacement command.
- It is possible to cram this into a single line by using multiple -e options, but that makes the command less readable.
Note the use of verbose POSIX character classes such as [[:blank:]], because shortcut classes such as \s aren't supported.
- In particular, while \n can be matched in the regex in principle, it is inexplicably not recognized inside a character class. Thus, [^\n] must be emulated with [[:print:][:blank:]] -- [:blank:] is added to also match tab chars., which are not considered printable chars; (\t can't be used inside char. classes either).

@GroovyEd: That implies that you're _not_ using _GNU_ `sed` (the stock `sed` on _Linux_ distros); please try the POSIX-compliant solution instead. What platform are you on? — mklement0, Dec 19 '14 at 23:30
@GroovyEd: Macs (OS X) use the BSD version of `sed`, which is less powerful than the GNU version and has many quirks - to contrast it with GNU `sed`, see http://stackoverflow.com/a/24276470/45375. In general, you should indicate the target platform when asking `awk` or `sed` questions - unless you're looking for a truly multi-platform, POSIX-compliant solution (which will typically be more cumbersome). — mklement0, Dec 20 '14 at 04:23

repzero · Answer 4 · 2014-12-20T00:09:05.200

-1

search for any line that has the expression "CONSTRAINT" and delete line and the , before that line.

sed -e '/.*/{
N
s/,\n.*CONSTRAINT//g}' -e '/.*CONSTRAINT.*/d' my_file

edited Dec 20 '14 at 00:09

answered Dec 19 '14 at 23:35

repzero

8,254
2
18
40

So he wants to drop the comma before the line that has constraint then delete the line that has constraint? – repzero Dec 19 '14 at 23:45
Yes, but only if the line preceding the constraint is the enclosing SQL statement's _last_ line. That's a simple _syntax_ requirement: while you separate clauses _inside_ the SQL statement with `,`, the _last_ clause must NOT end in `,`. – mklement0 Dec 19 '14 at 23:50
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/67354/discussion-between-xorg-and-mklement0). – repzero Dec 20 '14 at 00:15
Turns out that even if you fix and clean up this solution (GNU `sed`: `sed 'N; s/,\n.*CONSTRAINT.*$//; /CONSTRAINT/d' my_file`), it would only work with _even_ numbers of constraints, due to reading lines in _pairs_ with `N`. – mklement0 Dec 23 '14 at 03:08
@mklement0 that is why i put in the "-e '/.*CONSTRAINT.*/d'" – repzero Dec 23 '14 at 10:07
In case you were thinking of a _two-pass_ solution, an extra `-e` argument will _not_ achieve that. Your solution _would_ work - and would be a conceptually simpler alternative to the read-all-lines-at-once solutions - if you converted the 2nd `-e` command into a _separate `sed` invocation_: `sed 'N; s/,\n.*CONSTRAINT.*$//' my_file | sed '/CONSTRAINT/d'`. (Adding an `-e` does NOT add another _pass_ through the input, it simply adds another _command_ to apply to the one and only pass - this is why my original simplification simply used a `;` to separate the commands - it has the same effect.) – mklement0 Dec 23 '14 at 13:02
In case you were wondering: I was the down-voter, because you had left behind a solution that demonstrably didn't and _couldn't_ work as written. I now understand what you were _trying_ to do, and, as I said, think that if you fix it, the result is a conceptually clean 2-pass solution. I'll turn my down-vote into an up-vote if you do so. – mklement0 Dec 23 '14 at 13:49

Removing a line, plus drop the comma of the last line with a matching pattern

4 Answers4