0

been searching everywhere but cannot seem to solve this problem.

I have a csv file which contains two headings, "Name" and "URL". I've saved this in a variable called df1, as per below:

`

import pandas as pd

df1 = pd.read_csv('yahoo finance.csv')
print(df1)

      Name                                        URL
0  Gainers  https://au.finance.yahoo.com/gainers?e=ax
1   Losers        https://au.finance.yahoo.com/losers
2   Active   https://au.finance.yahoo.com/most-active

`

What I'm trying to do is go into each of the above URL's, parse the table within it, and save the data in a new CSV file.

`

for u in df1.URL:
    u2 = pd.read_html(u)
    for n in u2:
        row2 = pd.DataFrame(num)
        row2.to_csv(name+'.csv', index=False)

`

I am missing a big step here that I can't resolve, I want to save the table from each URL into a new CSV with the name from the "Name" column of the corresponding url.

Can someone help me fix this simple part? Currently all this code does is save the last URL's data to a csv named "Active", it's not saving the first two URL's at all.

Thank you in advance!

stinsfire
  • 13
  • 1
  • 3
  • `num` and `name` might be set to incorrect values, so `row2 = pd.DataFrame(num)` and `row2.to_csv(name+'.csv', index=False)` are not working as you expect them. print the values and see what they are – Joe Jul 30 '17 at 07:53

2 Answers2

1

Thank you, this has helped solve the issue now, the CSV's are saving as they should be. The updated code is:

for row in df1.iterrows():
    name = row[1]['Name']
    url = row[1]['URL']
    url2 = str(url)
    url3 = pd.read_html(url2)
    for num in url3:
        row2 = pd.DataFrame(num)
        row2.to_csv(name+'.csv', index=False)
stinsfire
  • 13
  • 1
  • 3
0

Do you mean you need to iterate a dataframe row by row? Is URL value used for getting data. Is Name is used for saving data. If yes probably you need it

for row in df.iterrows():
    name = row[1]['Name']
    url = row[1]['URL']