1

I'm trying to create a program using Python that will go through a file containing a git diff (in C code), go through the file, and remove the comments. I tried to read from the file and print a new comment-less version in a different file, but it doesn't seem to be working. I'm also now becoming aware that it will not work for multiline comments.

Here's my code:

write_path = "diff_file"   # new file to write in
read_path = "text_diff"    # text_diff is the original file with the diff
with open(read_path,'r') as read_file:
  text_diff = read_file.read().lower() 
  for line in read_file:
    if line.startswith("/*") and line.endswith("*/"):
      with open(write_path, 'a') as write_file:
       write_file.write(line + "/n")

For reference, I'm running it under WSL.

Veronika
  • 29
  • 1
  • I think you just need to change your `if` statement to *not* write the line if it has a comment: `if not (line.startswith("/*") and line.endswith("*/")):` – Random Davis Jul 12 '21 at 21:24
  • It's not that it still prints the comments, it's that it doesn't print anything to `diff_file` – Veronika Jul 12 '21 at 22:16
  • Is the `if` statement ever actually being entered? I think it isn't, because every line ends with a newline (`'\n'`). You should read the file without newlines: https://stackoverflow.com/questions/12330522/how-to-read-a-file-without-newlines – Random Davis Jul 12 '21 at 22:18
  • If you want to do the job properly, you need a full lexical analyzer: `/*` inside a string is not a comment, for instance. You can probably dispense with the trigraph silliness, though. Note that `git diff` output is insufficient: you must strip comments *before* diffing. – torek Jul 13 '21 at 06:14
  • @torek , to strip the comments before diffing, would that change anything in the code itself? How would I incorporate that stripper before executing git diff? – Veronika Jul 13 '21 at 17:34
  • You would need to strip the comments from some sort of intermediate file, which Git (or any other diff-generator) would then read as its input. That way you would get comment-less diffs. Git is not really built to do this, but you can use its "text conversion" trick, or use `git difftool` to get Git to run an external diff. This is, ultimately, a moderately hard problem, with not-all-that-useful results, which is why nobody else has done it for you already. :-) – torek Jul 14 '21 at 06:30

1 Answers1

0

I tried this. I changed 'a' to 'w' (write) when opening the output file, and changed its position to avoid opening everytime. I also changed the if condition. That way when there is a comment line it is not printed to the new file.

Also, in endswith I included \n, since a new line is included at the end of the string. And deleted the \n when writing.

write_path = "diff_file"   # new file to write in
read_path = "text_diff"    # text_diff is the original file with the diff
with open(read_path,'r') as read_file:
    text_diff = read_file.readlines()
    with open(write_path, 'w') as write_file:
        for line in text_diff:
            if not (line.startswith("/*") and line.endswith("*/\n")):
                write_file.write(line)
DeusDev
  • 538
  • 6
  • 15
  • How would you account for whitespace? In git diff for C code, all the lines begin with either + or - followed by some amount of whitespace. I suspect this is why it's still printing comments, but I'm not totally sure. – Veronika Jul 13 '21 at 16:09
  • It would be better if you could include an example of your input (a short sample of the file you want to clean) and the desired output for that. That way we can come up with a more generalized case. – DeusDev Jul 13 '21 at 17:58