-1

How can I , if possible, carry the end part of the code shown below, over onto another line(s) or alter the text, to achieve the desired outcome, with less code. I have typed the following code:

import pandas as pd
import requests
from bs4 import BeautifulSoup
res = requests.get("http://web.archive.org/web/20070826230746/http://www.bbmf.co.uk/july07.html")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]

df = pd.read_html(str(table))
df = df[1]
df = df.rename(columns=df.iloc[0])
df = df.iloc[2:]
df.head(15)

Southport = df[df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D') & (df['Spitfire'] == 'S') & (df['Hurricane'] == 'H') | (df[df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D') & (df['Spitfire'] == 'S') | df[df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D') & (df['Spitfire'] == 'SS')] 
Southport

What I am trying to achieve is the following Data to be shown :- Displays only, and only showing Dakota Spitfire and Hurricane or Dakota and Spitfire or Dakota and Two Spitfires, if they are shown in the Data Table Schedule, here is the whole Code. It is the line starting Southport = that needs editing:

I get the following Traceback Error when I run the Code, which I believe is due to the line of code being too long:

File "<ipython-input-1-518a9f1c8e98>", line 23
    Southport
            ^
SyntaxError: invalid syntax

I am running the code in the internet program Jupyter Notebook

Tom Hood
  • 497
  • 7
  • 16
Edward Winch
  • 47
  • 2
  • 9
  • 4
    The error is on line 23. Your code snippet is only 14 lines. It's pretty hard to see what might be wrong with a line of code you've not included. – Ken White Jun 20 '19 at 18:37
  • 5
    No, there shouldn't be a syntax error due to the length of your line. – juanpa.arrivillaga Jun 20 '19 at 18:38
  • To make your code more readable (which will make it easier for you to spot the error), you could look at using [masks](https://stackoverflow.com/questions/38802675/create-bool-mask-from-filter-results-in-pandas). – m13op22 Jun 20 '19 at 18:39
  • 1
    @HS-nebula they *are* using a "mask", i.e. boolean indexing. The problem is that the entire expression is on one line. It could easily be broken up into intermediate variables to improve legibility though. In any case, if there is a SyntaxError, I would make sure all those brackets and parentheses are properly balanced. That would be my guess. – juanpa.arrivillaga Jun 20 '19 at 18:40
  • That is the whole Code Ken, – Edward Winch Jun 20 '19 at 18:40
  • How do I break the line into Intermediate Variables juanpa ? – Edward Winch Jun 20 '19 at 18:42
  • 1
    @EdwardWinch are you asking me how to assign to variables? You seem to already know that. Conisder: `result = (x + y - z) / (3 + w)` *could* be turned into `numer = (x + y - z); divisor = (3 + w); total = numer / divisror`. Do something like that. But the *source* of your error *is not the length of the line*. It is *probably* because you have unbalanced parentheses or brackets. – juanpa.arrivillaga Jun 20 '19 at 18:44
  • @EdwardWinch also, it very clearly is not the code producing that error, unless you are claiming that the python runtime has suddenly forgotten how to count lines in the source code. – juanpa.arrivillaga Jun 20 '19 at 18:45
  • 1
    Your immediate problem is that your long line is missing a closing bracket: the very first one is unmatched. I checked through the command, and you have two more unmatched opens: the paren and bracket at the beginning of the second clause (a clause being the things separated by vertical bars). – Prune Jun 20 '19 at 18:49
  • Thanks Prune, Could you show me what the Line in question should look Like, when changed ? many thanks for you info. Eddie – Edward Winch Jun 20 '19 at 19:10

2 Answers2

1

In my opinion this is typical copy/paste error; you would simply have to delete the first (df[ after the first (df['Hurricane'] == 'H') | and the df[ after the second | - then there at least shouldn't be any syntax error anymore.

However, the logic is far too verbose, as either df['Location'].str.contains('- Display') and (df['Lancaster'] == '') and also (df['Dakota'] == 'D') is part of every or-separated boolean term.

Besides that, df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D') & (df['Spitfire'] == 'S') is a superset of df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D') & (df['Spitfire'] == 'S') & (df['Hurricane'] == 'H'), which means the latter doesn't provide more rows if you have the first anyway, so you can leave it completely away.
So everything is in a first step cut down to

Southport = df[df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D') & (df['Spitfire'] == 'S') | df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D') & (df['Spitfire'] == 'SS')] 

which can be expressed shorter as

Southport = df[(df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D')) & ((df['Spitfire'] == 'S') | (df['Spitfire'] == 'SS'))] 

because A & B & C | A & B & D is A & B & (C | D).

And if I recall it right, a pattern like =='S' or =='SS' should be better expressed by pandas' isin:

Southport = df[df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D') & df['Spitfire'].isin(['S', 'SS'])] 
SpghttCd
  • 10,510
  • 2
  • 20
  • 25
  • Hi many thanks for your help SpghttCd, Unfortunately no Data is shown when I run that edited Code, only the Column names i.e. :- Date Location Lancaster Spitfire Hurricane Dakota show – Edward Winch Jun 20 '19 at 20:20
  • Hard to tell without knowing the data - perhaps all your boolean masking leads to no resulting rows at all...? Perhaps you can post the result of `df.head()`, so that we can have a better idea of your dataframe. – SpghttCd Jun 20 '19 at 20:40
0

You've got one absurdly long line that could benefit from being split up:

Southport = df[df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D') & (df['Spitfire'] == 'S') & (df['Hurricane'] == 'H') | (df[df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D') & (df['Spitfire'] == 'S') | df[df['Location'].str.contains('- Display') & (df['Lancaster'] == '') & (df['Dakota'] == 'D') & (df['Spitfire'] == 'SS')] 

Let's do that, and we can maybe see the error:

Southport = df[
    df['Location'].str.contains('- Display') & 
    (df['Lancaster'] == '') & 
    (df['Dakota'] == 'D') & 
    (df['Spitfire'] == 'S') & 
    (df['Hurricane'] == 'H') | 
    (df[
        df['Location'].str.contains('- Display') & 
        (df['Lancaster'] == '') & 
        (df['Dakota'] == 'D') & 
        (df['Spitfire'] == 'S') | 
        df[
            df['Location'].str.contains('- Display') & 
            (df['Lancaster'] == '') & 
            (df['Dakota'] == 'D') & 
            (df['Spitfire'] == 'SS')] 

What stands out to me here is that the parentheses and brackets don't line up properly - there are more open-brackets (Two open-square brackets and one open-paren, to be precise) than close-brackets. This is consistent with why you would be receiving a SyntaxError. So let's try to fix this:

Southport = df[
        (
            df['Location'].str.contains('- Display') & 
            df['Lancaster'] == '' & 
            df['Dakota'] == 'D' & 
            df['Spitfire'] == 'S' & 
            df['Hurricane'] == 'H'
        )
    ] | df[
        (
            df['Location'].str.contains('- Display') & 
            df['Lancaster'] == '' & 
            df['Dakota'] == 'D' & 
            df['Spitfire'] == 'S'
        )
    ] | df[
        (
            df['Location'].str.contains('- Display') & 
            df['Lancaster'] == '' & 
            df['Dakota'] == 'D' & 
            df['Spitfire'] == 'SS'
        )
    ] 

which is, I expect, closer to what you were trying to write (though I'm not sure exactly what you were trying to write in the first place). Note how spacing out code like this on to separate lines makes it so much easier to see how the conditions align, what comparisons are nested within what other comparisons, etc.

The general rule of thumb, which your particular IDE may or may not yell at you about, is to keep line length below a certain threshold (say, 120 characters). Sometimes you have a singular expression that spans the entire line and goes past that because of having verbose variable names, and that's fine. But in general, that rule of thumb exists to encourage this behavior - splitting your line in a way that makes it more clear to people reading the code exactly what is going on.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
  • Hi Green Cloak Guy, I get the following Traceback Error :- ModuleNotFoundError: No module named 'api' Even though I have installed the Module pip-api, and type Import api in the Python Code. Any ideas ? – Edward Winch Jun 20 '19 at 19:55
  • Sorry, I dunno. That's beyond the scope of the question you asked here - if it continues to be a problem then you can submit a new question about it specifically, with the stacktrace and relevant code and whatnot. – Green Cloak Guy Jun 20 '19 at 20:31
  • Thanks Green Cloak Guy, I will do that ) – Edward Winch Jun 20 '19 at 20:39