1

I intend to remove symbol in my CSV file that I just created from web-scraping method. To put into a context, my coordinates contain degree symbol and I want to remove it.

Here is my code:

#import modules
import requests
import urllib.request
from bs4 import BeautifulSoup
from datetime import datetime
import time
import csv
import os
import re
from selenium import webdriver
import schedule

try:      
    def retrieve_website():
        # Create header
        headers = {'user-agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'}

        # URL of the ship you want to track, execute the request and parse it to the variable 'soup'
        url = 'https://website-'
        reqs = requests.get(url, headers=headers)
        soup = BeautifulSoup(reqs.text, 'lxml')

        # Save file to local disk
        with open("output1.html", "w", encoding='utf-8') as file:
            file.write(str(soup))

        # open file to local disk
        with open("output1.html", "r", encoding='utf-8') as file:
            soup = BeautifulSoup(file, 'lxml')

        # All td tags are read into a list
        data = soup.find_all('td')

        # Extract the coordinates
        Longitude = data[23].get_text()
        Latitude = data[24].get_text()
    
        # Extract heading
        Heading = data[27].get_text()
        
        #save as location
        dwnpath = r'S:\location'
        
        # Write data to a csv file with comma as seperator    
        with open(os.path.join(dwnpath, 'Track.csv'), 'w', newline='') as csv_file:
            fieldnames = ['Longitude', 'Latitude', 'Heading']
            writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=',')
            writer.writeheader()
            writer.writerow({'Longitude': Longitude, 
                             'Latitude': Latitude,
                              'Heading': Heading})
    
    # Start the funtion the first time when the program starts
    retrieve_website()
    
except Exception as error:
    print(error)
    
print('Script Complete!')

Above is my code which about scrapping some information from specific website. I retrieved coordinates. it looks like this:

Longitude Latitude Heading
1234°     456°     789°

But I want to be like this:

Longitude Latitude Heading
1234      456      789

Thanks.

martineau
  • 119,623
  • 25
  • 170
  • 301

4 Answers4

1

This should do the trick!

...
writer.writerow({
    'Longitude': Longitude.replace('°', ''), 
    'Latitude': Latitude.replace('°', ''),
    'Heading': Heading.replace('°', ''),
})
...    
Zev Averbach
  • 1,044
  • 1
  • 11
  • 25
1

Other answers work too, however to generalize the solution, you can use ReGeX to remove any non-alphanumerical characters.

import re
s = "1°°23%%&&**!!"
numeric_string = re.sub("[^0-9]", "", s)

Which results in:

>> 123
Ege Yıldırım
  • 430
  • 3
  • 14
0

Have you tried str.replace? Let's say you have a string '1260°':

 s='1260°'

this:

 s.replace('°', '') 

will return '1260'

Alessandro Togni
  • 680
  • 1
  • 9
  • 24
0

Please find amended code below! It uses the str.replace method to create new variables, namely Lat, Lon and Head which are then written in the output.

import modules
import requests
import urllib.request
from bs4 import BeautifulSoup
from datetime import datetime
import time
import csv
import os
import re
from selenium import webdriver
import schedule

try:      
    def retrieve_website():
        # Create header
        headers = {'user-agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17      (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'}

    # URL of the ship you want to track, execute the request and parse it to the variable 'soup'
    url = 'https://website-'
    reqs = requests.get(url, headers=headers)
    soup = BeautifulSoup(reqs.text, 'lxml')

    # Save file to local disk
    with open("output1.html", "w", encoding='utf-8') as file:
        file.write(str(soup))

    # open file to local disk
    with open("output1.html", "r", encoding='utf-8') as file:
        soup = BeautifulSoup(file, 'lxml')

    # All td tags are read into a list
    data = soup.find_all('td')

    # Extract the coordinates
    Longitude = data[23].get_text()
    Lon = Longitude.replace('°', '') 
    Latitude = data[24].get_text()
    Lat = Latitude.replace('°', '') 
    # Extract heading
    Heading = data[27].get_text()
    Head = Head.replace('°', '') 

    #save as location
    dwnpath = r'S:\location'
    
    # Write data to a csv file with comma as seperator    
    with open(os.path.join(dwnpath, 'Track.csv'), 'w', newline='') as csv_file:
        fieldnames = ['Longitude', 'Latitude', 'Heading']
        writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=',')
        writer.writeheader()
        writer.writerow({'Longitude': Lon, 
                         'Latitude': Lat,
                          'Heading': Head})

# Start the funtion the first time when the program starts
retrieve_website()

except Exception as error:
print(error)

print('Script Complete!')`
  
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jun 20 '22 at 08:40