-1

enter link description here

here is the link of table.

Write Python code that scrapes the first table from the website, converts it into a pandas data frame. As output for Part 1, create a subset named dams containing all the data for the top 3 dams based on their capacity for hydropower generation.

this is my codes.

 import pandas as pd
url = "dams.html"
table = pd.read_html (url,header=0)[0]
table

dams = table.groupby('Name').sum()
dams = dams.sort_values('Installed capacity [MW]',ascending = False)[:3]
dams

# I want to show all of columns of original table.

Here are my problem: DataFrame shape mismatch [left]: (3, 4) [right]: (3, 9)

Thank you for your help

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Yuxin Li
  • 1
  • 1

1 Answers1

0

The problem is that because you have done groupby on 'sum', only columns with numbers will appear in dams

Other columns such as Type, Country cannot appear because they are not numerical values, and cannot be summed

If you really want to show all columns, you can replace the sum with count.

enter image description here

Hanalia
  • 187
  • 1
  • 9
  • But the new problem shown. The 'Name' also is column of original table. If I use 'groupby()' function. 'Name' column will be locked. Also returned ' shape mismatch'. – Yuxin Li Aug 10 '21 at 12:07
  • The require result is (3,9). missing name column, the shape is (3,8). – Yuxin Li Aug 10 '21 at 12:08
  • The reason is because Name column has turned into an index. try the below : dams = table.groupby('Name', as_index=False).sum() – Hanalia Aug 10 '21 at 12:09
  • And, the teacher need us to find TOP3 dams. If I use .count() replace .sum(), the result is not correct. – Yuxin Li Aug 10 '21 at 12:11
  • you do not need to do groupby at the first place. With the first table, just do sortby : table.sort_values(by='Installed capacity [MW]', ascending=False).head(3) – Hanalia Aug 10 '21 at 12:14
  • Yep, the Name column not be index anymore. But the same question, the shape is not correct. – Yuxin Li Aug 10 '21 at 12:14