1

I've got a perl script (using -p flag) that performs some corrections on a corrupted C source file. Here's part of the script:

sub remove_sp {
    $_ = shift; 
    s/ /, /g; 
    return $_;
}

s/(\([^}]*\))/remove_sp($1)/eg;

This replaces spaces inside parenthesis with , e.g. foo(bar baz) becomes foo(bar, baz). However, it's not very smart. It also changes foo("bar baz") to foo("bar, baz") which obviously isn't something I want.

I can't think of a way to rewrite the script so that it replaces a space with a comma-space only when the space is not between quotes. How can I do this?


Here's a simple table of what I need and what isn't working.

Search                       | Replace                        | Currently handled correctly?
--------------------------------------------------------------------------------------------
foo(bar baz)                 | foo(bar, baz)                  | Yes
foo("bar baz")               | foo("bar baz")                 | No
foo("bar baz" bak)           | foo("bar baz", bak)            | No
foo("bar baz" bak "123 abc") | foo("bar baz", bak, "123 abc") | No
MD XF
  • 7,860
  • 7
  • 40
  • 71
  • @MattJacob The issue is with input such as `print("foo bar" baz)`, which should come out as `print("foo bar", baz)`. – MD XF Feb 28 '18 at 00:49
  • Tip: Don't clobber `$_`!!! At least use `local $_ = shift;`, but even that can cause problems because `$_` is not uncommonly aliased to a magical or read-only variable. `for (my $s = shift) { ... }` is safe, but you're better off using just `my $s = shift;` in most cases. – ikegami Feb 28 '18 at 15:27

2 Answers2

3

You could use Text::ParseWords to get the data between the parens and do the substitution on the results of the parse.

#!/usr/bin/perl
use strict;
use warnings;
use Text::ParseWords;

for ('foo("bar baz")', 'print("foo bar" baz)', 'foo(bar baz)') {
    my $s = $_;
    $s =~ s/(\([^)]*\))/remove_sp($1)/eg;
    print $s, $/;
}

sub remove_sp {
    join ", ", quotewords('\s+', 1, shift);
}

Output:

foo("bar baz")
print("foo bar", baz)
foo(bar, baz)
Chris Charley
  • 6,403
  • 2
  • 24
  • 26
3

I don't think that's possible. I can think of a couple of grammatical edge cases where it's impossible to determine whether a comma is needed or not:

String pasting

foo("abc" "def");   // = foo("abcdef")
foo("foo", "bar");

Placing two string constants next to each other causes them to be "pasted" together. Without knowing how many arguments are needed for a function, there's no way of telling whether this was the expected behavior.

Comma expressions, e.g. in for loops

The comma is an operator in C; it evaluates two expressions and returns the value of the one on the RHS. Combined with the unary/binary dual nature of the +, -, &, and * operators, this means that an expression as simple as:

a + b    or    a * b

can have a comma inserted into it:

a, +b    or    a, *b

While this is a contrived example, more complex cases can emerge, e.g. in complex for loops.

Function arguments

Similarly:

foo(a * b - 1);
foo(a * b, -1);
foo(a, *b - 1);
foo(a, *b, -1);
(etc)

Without knowing the number of arguments that are expected to a function, it's impossible to tell whether a comma should be inserted. And sometimes even that isn't enough!

  • I wasn't looking for optimal C syntax correction/compression, just adding commas where necessary. We can assume that anywhere a comma looks like it should exist should exist. – MD XF Feb 28 '18 at 05:47
  • 2
    It's not a matter of "correction/compression". There are situations where it's impossible to tell whether a comma was supposed to be there or not. If it was possible to guess where all the commas were supposed to be, you wouldn't need to include them. :) –  Feb 28 '18 at 05:53
  • I'm just saying that this answer goes beyond the scope of the question; I want to replace spaces between parenthesis with comma-space unless the spaces are between quotes. – MD XF Mar 01 '18 at 00:12