Can I get a lint error on implicit string joining in python?

Question

Is there some way to get lint error on missed commas in literal list of strings?

Example:

exceptions = ["banana", "pineapple", "apple"
              "pen"]

You may think this list contains 4 items, but truth be told! "apple" and "pen" are joined into "applepen".

I'm terrified of these omitted commas. Is there some lint tool to help me find them?

Example 2:

exceptions = ["Carmichael",
              "Vanessa"      # <--- Spot the missing comma
              "Ford"]

The same is true for `"foo" "bar"` which is `"foobar"`. Is this an error? Why? — , Nov 09 '16 at 09:10
Hmm yes, I think I would like a lint error for this case as well. It creates more hassle than what it is worth. See also http://legacy.python.org/dev/peps/pep-3126/#concerns — Moberg, Nov 09 '16 at 09:15
I do not know if there is any tool for this? Even if it does, why you need that? It is the behavior of Python, and you should keep that in mind. There will always be *What "IF"*? What if I wrote 2 in place of 3? Do you need tool to tell you about that? The concern you mentioned in the question is same. — Moinuddin Quadri, Nov 09 '16 at 09:16
@anonymous: It is too easy omit the comma and never notice that I have this bug in my code. I don't want to deliver a deficient product. Finding these omitted commas would be one way to increase the quality of my code. — Moberg, Nov 09 '16 at 09:20
For what I know, there is no static analysis tool that detects implicit joining of string literals. Usually I'd say this kind of error should be detected by automated tests which are written anyway to prove new code works as expected. — Łukasz Rogalski, Nov 10 '16 at 13:06
@ŁukaszRogalski I think you are somewhat right about the testing. But somewhere I have to write the list, don't I? — Moberg, Nov 11 '16 at 07:22

Cong Ma · Answer 1 · 2016-11-10T13:01:32.780

I'm not sure what kind of source analyze tool you're using, so I can only propose a suggestion. However, it would be too long for a comment, so I wrote a proof-of-concept script.

The idea is to look at the source code with Python's tokenize module, which generates tokens from Python expressions. If well-formed Python code contains implicitly continued string literals, it will show up as a STRING token followed by NL.

For example, let's use the following source file source.py as a test case.

x = ("a"
        "b"  # some trailing spaces
# Coment line
"c"
""
     # The following is an explicit continuation
  "d" \
     "e")

Running the command python check.py < source.py on the file generates:

1:8: implicit continuation: 
x = ("a"

     ~~~^
2:35: implicit continuation: 
        "b"  # some trailing spaces

                                ~~~^
4:3: implicit continuation: 
"c"

~~~^
5:2: implicit continuation: 
""

  ^

The program, check.py, is just a proof-of-concept and it does not check syntax errors or other edge cases:

import sys
import tokenize


LOOKNEXT = False
tok_gen = tokenize.generate_tokens(sys.stdin.readline)
for tok, tok_str, start, end, line_text in tok_gen:
    if tok == tokenize.STRING:
        LOOKNEXT = True
        continue
    if LOOKNEXT and (tok == tokenize.NL):
            warn_header = "%d:%d: implicit continuation: " % start
            print >> sys.stderr, warn_header
            print >> sys.stderr, line_text
            indents = start[1] - 3
            if indents >= 0:
                print >> sys.stderr, "%s~~~^" % (" " * indents)
            else:
                print >> sys.stderr, "%s^" % (" " * start[1])
    LOOKNEXT = False

Hopefully the idea might help you extend your lint tool or IDE for your purpose.

Jose F. Gomez · Answer 2 · 2016-11-10T11:04:33.277

-1

SublimeText3 with plugin flake8, that is a wrapper around other python lint plugins, can fix it.

Else, you can make a script that count ((number of ")/2)-1 and commas in a line, and if result dont match, add a coma.

EDIT:

Explanaition of what i'm saying:

    def countQuotes(string):
        return string.count('"')
    def countCommas(string):
        return string.count(',')
    files = os.listdir('your/directory')
    for filename in files:
    if filename.endswith(".py"):
        with fileinput.FileInput("your/directory"+"/"+filename, inplace=True, backup='.bak') as fileContent:
        for line in fileContent:
            if '"' in line:
                numQuotes = countQuotes(line)
                numCommas = countCommas(line)
                if(numQuotes == 2 and ']' in line):
                    if(numCommas != 0):
                        #error, must delete a comma in right place and print line
                    else:
                        print(line)
                if(numQuotes == 2 and ']' not in line):
                    if(numCommas != 1):
                        #error, must add a comma in right place and print line
                    else:
                        print(line)
                if(numQuotes > 2):
                    if(numCommas > (numQuotes//2)-1)
                        #error, must delete a comma in right place and print line
                    elif(numCommas < (numQuotes//2)-1)
                        #error, must add a comma in right place and print line
                    else:
                        print(line)
           else:
                print(line)

This method must work, just think where you must insert or delete the comma to finally have the format you want.

edited Nov 10 '16 at 11:04

answered Nov 09 '16 at 09:11

Jose F. Gomez

178
1
6

Such a simple counting doesn't work. Compare question in example and `["car", "bike"]` – Moberg Nov 09 '16 at 09:17
Unfortunately not using sublime at work. Perhaps I should. – Moberg Nov 09 '16 at 09:18
@moberg i think it still working in your case, 4 ", 1 comma – Jose F. Gomez Nov 09 '16 at 09:38
@jose-f-gomez And how do you mean it would react to `["car", "bike"` – Moberg Nov 09 '16 at 12:13
@moberg number of quotes = 4, then number of commas must be 1. In this case its ok. If you have in next line `co` – Jose F. Gomez Nov 10 '16 at 10:32
`"walk"]`, then you must add a comma before the second of the quotes starting by final. If next line is `,"walk"]`, the result will be ok. It solves your problem – Jose F. Gomez Nov 10 '16 at 10:35
1

What about commas within strings or in comments? ``["hello, world", "spam"]`` – BlackJack Nov 10 '16 at 17:04
@blackjack You can handle it when you add or delete a comma, there aren't so much cases. – Jose F. Gomez Nov 11 '16 at 07:57
@JoseF.Gomez There are infinite many cases. You can have any number of commas in strings or in comments or in strings _and_ comments at the same time/line. Your solution isn't able to deal with normal, everyday code, just with some artificially constricted subset. So it's not really a solution IMHO. And there's also strings delimited by """ instead of just ". How is your code dealing with list comprehensions, which also happen to start with `[` and end with `]` and may contain literal strings? – BlackJack Nov 14 '16 at 13:10
@blackjack ¿Infinite? list comprehensions, comments and string delimitations. ¿Any more? I see this solution better that search by hand, but is not perfect, like any other lint i know, you can make it fail in a easy way. You can too set the infile variable to false and print directly in stdout possible error lines in file. It will help a lot to find that errors. ¿Any better solution? In a couple of hours i can handle 30 or more common fails in each case, and specialy if the code is mine, but probably i cant read all the code of a big aplication and figure out where are all the fails. – Jose F. Gomez Nov 14 '16 at 16:36
Of course you can extend that hack for special cases. Or write a solid solution instead of a hack. Something based on the `tokenize` or `ast` module or the one used by pylint which also actually parses Python. Any decent lint I know parses the language in question instead of resorting to fragile string operations with just some heuristics about the grammar. – BlackJack Nov 15 '16 at 15:03

Can I get a lint error on implicit string joining in python?

2 Answers2