0

I wanted to scrape date table from the different html webpages into csv file but dates are importing into encoded format

I am using beautiful soup with python 3 , also opening the file with encoding utf-8 for html pages. i am trying to import the table from the page https://www.timeanddate.com/holidays/india/2010

Sample code :

rows = table.find_all('tr')

csvFile = open("test12.csv","w+", newline='', encoding = "utf-8")

try:
    writer=csv.writer(csvFile)
    for row in rows:
        csvRow = []
        for cell in row.findAll(['td','th']):
            csvRow.append(cell.get_text())
        writer.writerow(csvRow)

I am getting following result. dates are not importing in proper format

Date  

1 जनवरी रविवार 5 जनवरी गà¥à¤°à¥à¤µà¤¾à¤° 14 जनवरी शनिवार 15 जनवरी रविवार 23 जनवरी सोमवार 26 जनवरी गà¥à¤°à¥à¤µà¤¾à¤° 28 जनवरी शनिवार

Archana G
  • 13
  • 3

2 Answers2

1

This script will parse all rows and stores in into .csv file:

import requests
from bs4 import BeautifulSoup
import csv

url = 'https://www.timeanddate.com/holidays/india/2010'

soup = BeautifulSoup(requests.get(url).text, 'lxml')

out = [[td.text.strip() for td in tr.select('th, td')] for tr in soup.select('tr[data-mask]')]

with open('file.csv', 'w') as f_out:
    writer=csv.writer(f_out)
    writer.writerows(out)

Output of csv file (in Estonian):

1. jaan,reede,New Year's Day,Restricted Holiday
5. jaan,teisipäev,Guru Govind Singh Jayanti,Restricted Holiday
14. jaan,neljapäev,Pongal,Restricted Holiday
20. jaan,kolmapäev,Vasant Panchami,Restricted Holiday
26. jaan,teisipäev,Republic Day,Gazetted Holiday
8. veebr,esmaspäev,Maharishi Dayanand Saraswati Jayanti,Restricted Holiday
12. veebr,reede,Maha Shivaratri/Shivaratri,Gazetted Holiday
14. veebr,pühapäev,Chinese New Year,Observance
14. veebr,pühapäev,Valentine's Day,Observance
19. veebr,reede,Shivaji Jayanti,Restricted Holiday
27. veebr,laupäev,Milad un-Nabi/Id-e-Milad,Gazetted Holiday
1. märts,esmaspäev,Holi,Restricted Holiday
16. märts,teisipäev,Chaitra Sukhladi,Restricted Holiday
20. märts,laupäev,March Equinox,Season
24. märts,kolmapäev,Rama Navami,Gazetted Holiday
30. märts,teisipäev,First day of Passover,Observance
1. apr,neljapäev,Maundy Thursday,"Observance, Christian"
2. apr,reede,Good Friday,Gazetted Holiday
4. apr,pühapäev,Easter Day,Restricted Holiday
14. apr,kolmapäev,Vaisakhi,Restricted Holiday
28. apr,kolmapäev,Mahavir Jayanti,Gazetted Holiday
1. mai,laupäev,May Day,Observance
9. mai,pühapäev,Mother's Day,Observance
9. mai,pühapäev,Birthday of Ravindranath,Restricted Holiday
27. mai,neljapäev,Buddha Purnima/Vesak,Gazetted Holiday
20. juuni,pühapäev,Father's Day,Observance
21. juuni,esmaspäev,June Solstice,Season
26. juuni,laupäev,Hazarat Ali's Birthday,Restricted Holiday
13. juuli,teisipäev,Rath Yatra,Restricted Holiday
1. aug,pühapäev,Friendship Day,Observance
15. aug,pühapäev,Independence Day,Gazetted Holiday
19. aug,neljapäev,Parsi New Year,Restricted Holiday
23. aug,esmaspäev,Onam,Restricted Holiday
24. aug,teisipäev,Raksha Bandhan (Rakhi),Restricted Holiday
2. sept,neljapäev,Janmashtami,Gazetted Holiday
10. sept,reede,Jamat Ul-Vida,Restricted Holiday
11. sept,laupäev,Ramzan Id/Eid-ul-Fitar,"Muslim, Common local holiday"
11. sept,laupäev,Ganesh Chaturthi/Vinayaka Chaturthi,Restricted Holiday
23. sept,neljapäev,September Equinox,Season
2. okt,laupäev,Mahatma Gandhi Jayanti,Gazetted Holiday
14. okt,neljapäev,Maha Saptami,Restricted Holiday
15. okt,reede,Maha Ashtami,Restricted Holiday
17. okt,pühapäev,Dussehra,Gazetted Holiday
22. okt,reede,Maharishi Valmiki Jayanti,Restricted Holiday
31. okt,pühapäev,Halloween,Observance
5. nov,reede,Diwali/Deepavali,Gazetted Holiday
6. nov,laupäev,Govardhan Puja,Restricted Holiday
7. nov,pühapäev,Bhai Duj,Restricted Holiday
17. nov,kolmapäev,Bakr Id/Eid ul-Adha,Gazetted Holiday
21. nov,pühapäev,Guru Nanak Jayanti,Gazetted Holiday
24. nov,kolmapäev,Guru Tegh Bahadur's Martyrdom Day,Restricted Holiday
2. dets,neljapäev,First Day of Hanukkah,Observance
9. dets,neljapäev,Last day of Hanukkah,Observance
17. dets,reede,Muharram/Ashura,Gazetted Holiday
22. dets,kolmapäev,December Solstice,Season
24. dets,reede,Christmas Eve,Restricted Holiday
25. dets,laupäev,Christmas,Gazetted Holiday
31. dets,reede,New Year's Eve,Observance
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • these code is helpful for me for web scrapping. but what if I want to extract data from multiple webpages ? I am beginner in python. Can we combine the data from different webpages into single Sheet? @Andrej Keseley – Archana G Jul 10 '19 at 18:51
0

Let Pandas do all that work:

import pandas as pd

url = 'https://www.timeanddate.com/holidays/india/2010'

# Gets all tables from site and stores as list of dataframes
table = pd.read_html(url)

# Get the dataframe in index position 0
table = table[0]

# Drop the rows with nulls
table = table.dropna(axis=0)

# Write to file
table.to_csv('file.csv', index=False)

And this can be condensed into 1 line:

pd.read_html('https://www.timeanddate.com/holidays/india/2010')[0].dropna(axis=0).to_csv('C:/file.csv', index=False)

Output:

print (table.head(10).to_string())
      Date Unnamed: 1_level_0                                  Name                Type
      Date Unnamed: 1_level_1                                  Name                Type
0    Jan 1             Friday                        New Year's Day  Restricted Holiday
1    Jan 5            Tuesday             Guru Govind Singh Jayanti  Restricted Holiday
2   Jan 14           Thursday                                Pongal  Restricted Holiday
3   Jan 20          Wednesday                       Vasant Panchami  Restricted Holiday
4   Jan 26            Tuesday                          Republic Day    Gazetted Holiday
6    Feb 8             Monday  Maharishi Dayanand Saraswati Jayanti  Restricted Holiday
7   Feb 12             Friday            Maha Shivaratri/Shivaratri    Gazetted Holiday
8   Feb 14             Sunday                      Chinese New Year          Observance
9   Feb 14             Sunday                       Valentine's Day          Observance
10  Feb 19             Friday                       Shivaji Jayanti  Restricted Holiday
chitown88
  • 27,527
  • 4
  • 30
  • 59
  • Thank you so much for the help! It is storing the data into csv file. It is really convenient to use pandas library. – Archana G Jul 10 '19 at 18:56