0

I have been trying to download data from a website and then save it to a csv file. The problem is: I cannot save it in a proper way to read it or to import into a database.

Here is my code:

import csv
import requests
from bs4 import BeautifulSoup

def getData(url_to_scrap='https://www.investing.com/currencies/eur-usd-historical-data', file=None, save_file="Name.csv"):

    if url_to_scrap is not None:
        header = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
        r = requests.get(url_to_scrap, headers=header)  
        data = BeautifulSoup(r.content, 'html.parser')
    else:
        data = BeautifulSoup(open(file, encoding='utf8'), 'html.parser')


    table = data.find(id='curr_table')
    table = table.find_all('td')

    row_data = []
    for row in table:
        row_data.append(row.get_text('data-real-value'))

    with open(save_file, 'w') as save:
        for row in row_data:
            writer = csv.writer(save, delimiter=';')
            writer.writerow(row)

getData(save_file="EUR USD Historical Data.csv")

And the output on the CSV file:

M;a;y; ;3;1;,; ;2;0;1;7

1;.;1;2;1;8

1;.;1;1;7;2

1;.;1;2;2;0

....

What I need:

May 31, 2017;1.1218;1.1172;1.1220;1.1165;0.30%

If you check the website, everything is in a table and I need it similar in csv. What should I change to make it work?

Vasco Ferreira
  • 2,151
  • 2
  • 15
  • 21

2 Answers2

1

Make sub lists of length six each (= 1 row) from your row_data list (via):

with open(save_file, 'wb') as save:
    writer = csv.writer(save, delimiter=';')
    for row in [row_data[x:x+6] for x in xrange(0, len(row_data), 6)]:
        writer.writerow(row)

#May 31, 2017;1.1242;1.1172;1.1252;1.1165;0.51%
#May 30, 2017;1.1185;1.1163;1.1207;1.1108;0.18%
#May 29, 2017;1.1165;1.1177;1.1192;1.1159;-0.16%
#...

(As noted here, xrange may need to be range if you are using Python 3; I'm on Python 2 and a beginner, so...)

lukeA
  • 53,097
  • 5
  • 97
  • 100
  • thank you for your answer, it made the script better, but now I need to remove the extra blank lines in the csv. I just changed **`open(save_file, 'wb')` to `open(save_file, 'w')`**. Now, how can I remove the extra blank lines? edit: **@njoosse** just used the link to read about it, and used `lineterminator='\n'` – Vasco Ferreira May 31 '17 at 23:19
  • @VascoFerreira Hmm why did you set wb back to b? I assume you can remove the extra line feeds by using `wb` as suggested. (See [here](https://stackoverflow.com/questions/23835866/many-tutorials-on-writing-csv-files-have-the-mode-set-to-wb-why) ) – lukeA May 31 '17 at 23:29
  • @VascoFerreira So lineterminator would be an alternative to wb? Good to know. If you problem is solved, you can choose an answer an mark it as solved. – lukeA Jun 01 '17 at 07:00
  • 1
    @lukeAI have changed because it gets the following error: `TypeError: a bytes-like object is required, not 'str'` – Vasco Ferreira Jun 01 '17 at 13:55
0

EDIT

When you are appending the strings to the row list, append it as a list

row_data.append([row.get_text('data-real-value')])

so that you have a list of lists of strings.

see Python csv library leaves empty rows even when using a valid lineterminator for a better example.

Keep the writerows as below to print a single item per line.

with open(save_file, 'w') as save:
        writer = csv.writer(save, delimiter=';')
        writer.writerows(row)

since it will place the delimiter between each element in the list

njoosse
  • 549
  • 3
  • 8
  • it doesn't work, the output stays like this: `May 31, 2017;1.1240;1.1172;1.1252;1.1165;0.49%;May 30, 2017;1.1185;1.1163;1.1207;1.1108;0.18%;May 29, ` all together in one line. – Vasco Ferreira May 31 '17 at 20:27
  • just used the link to read about it, and used `lineterminator='\n'` to remove the extra blank lines. `writerows(row)` didn't work, but I've used the answer of @lukeA – Vasco Ferreira May 31 '17 at 23:25