1

I'm really stuck into this. I don't know how to make these 3 dataframes into one because they are inside an array or so. I really need your help.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

str = 'https://fbref.com/en/comps/Big5/{}/{}-Big-5-European-Leagues-Stats'

seasons = ["2017-2018", "2018-2019", "2019-2020"]

for season in seasons:
    url = str.format(season, season)
    league = pd.read_html(url)
    league = league[0]
    league["Season"] = season
    print(type(league))


<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45

2 Answers2

1

You can use pandas.concat:

import pandas as pd

url = "https://fbref.com/en/comps/Big5/{}/{}-Big-5-European-Leagues-Stats"

seasons = ["2017-2018", "2018-2019", "2019-2020"]

dfs = []
for season in seasons:
    league = pd.read_html(url.format(season, season))[0]
    dfs.append(league)

df = pd.concat(dfs)
print(df)

df.to_csv("data.csv", index=False)

Prints:

    Rk            Squad  Country  LgRk  MP   W   D   L   GF  GA  GD  Pts  Pts/G    xG   xGA   xGD  xGD/90  Attendance                              Top Team Scorer                         Goalkeeper
0    1  Manchester City  eng ENG     1  38  32   4   2  106  27  79  100   2.63  80.1  23.0  57.1    1.50       54070                           Sergio Agüero - 21                            Ederson
1    2         Juventus   it ITA     1  38  30   5   3   86  24  62   95   2.50  59.8  28.7  31.0    0.82       39316                            Paulo Dybala - 22                   Gianluigi Buffon
2    3    Bayern Munich   de GER     1  34  27   3   4   92  28  64   84   2.47  77.7  33.6  44.1    1.30       75000                      Robert Lewandowski - 29                       Sven Ulreich
3    4        Paris S-G   fr FRA     1  38  29   6   3  108  29  79   93   2.45  89.2  32.2  57.0    1.50       46929                          Edinson Cavani - 28                    Alphonse Areola
4    5        Barcelona   es ESP     1  38  28   9   1   99  29  70   93   2.45  78.3  41.1  37.2    0.98       66603                            Lionel Messi - 34              Marc-André ter Stegen

...

And saves data.csv (screenshot from LibreOffice):

enter image description here

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

check this out... Hope this fixes your problem.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

str = 'https://fbref.com/en/comps/Big5/{}/{}-Big-5-European-Leagues-Stats'

seasons = ["2017-2018", "2018-2019", "2019-2020"]

dataframes = []

for season in seasons:
    url = str.format(season, season)
    league = pd.read_html(url)
    league = league[0]
    league["Season"] = season



    # Changes
    dataframes.append(league)


# Changes
new_dataframe = pd.concat(dataframes)

print(new_dataframe)

I used panda's .concat() method to concat all the dataframes in the list.

If you look at the season column, you can see the seasons 2017-2018, 2018-2019, and 2019-2020:

Hope this was helpful. Please don't hesitate to ask if you have more questions.