-1

I'm trying to remove any text inside of quotation marks (and the quotation marks themselves) from a file.

Basically I need this:

A A2A|"Dm"A2A "C"G2E|"Dm"D2D A,2D|

To turn into this:

A A2A|A2A G2E|D2D A,2D|

Here's a code snippet of what I originally tried:

def conversion():
    with open(abc + .'txt') as infile, open(abc + '.tmp', 'w') as outfile:
        for line in infile:
            #Delete anything inside of quotes after the header
            if '"' + '' in line:
                line = line.replace('"' + '', '')
                outfile.write(line)

            #Write everything else 
            else:
                outfile.write(line)
conversion()

This removes the quotation marks, but it leaves everything that was inside of them.


If I change

line = line.replace('"' +'','')

To

line = line.replace('"' + "Dm" + '"', '')

I can get rid of anything containing "Dm", theoretically I could program this for each possible combination, but that would be a huge PITA and I want to allow for human error (e.g. Someone wrote "Dma" instead of "Dmaj").


I've also tried using regex, but I honestly have no idea what I'm doing with it.

def conversion():
    with open(abc + '.txt') as infile, open(abc + '.tmp', 'w') as outfile:
        for line in infile:
            #Delete anything inside of quotes after the header
            if '"' in line:
                re.sub('".+"', '', line)
                outfile.write(line)

            #Write everything else 
            else:
                outfile.write(line)
conversion()

This seems to do nothing, I've looked through the python documentation, but there's no example to show how to use it in the context I'm trying to.

Dipen Shah
  • 1,911
  • 10
  • 29
  • 45
Noah Wood
  • 40
  • 5

1 Answers1

3

re.sub() returns the edited line, it doesn't edit in-place.

line = re.sub('".*?"', '', line)
outfile.write(line)

And your regex would match across quotes, so I edited it to make it non-greedy.

Community
  • 1
  • 1
TessellatingHeckler
  • 27,511
  • 4
  • 48
  • 87
  • Thank you, if you don't mind another question, does it matter whether you use * or + in re.sub() ? – Noah Wood Sep 17 '15 at 00:39
  • Yes it matters; `.*` matches "none or more characters", and `.+` matches "one or more characters". So `".*"` will match an empty pair of quotes (`""`), whereas `".+"` will not match empty quotes, it needs to have at least one character inside the quotes. – TessellatingHeckler Sep 17 '15 at 01:17