0

How to store as data frame with for loop request.

import time
from tqdm.notebook import tqdm
from GoogleNews import GoogleNews
from newspaper import Article

countries = df["Country"].unique().tolist()
print(countries) 

#Output:
-----------
['Malaysia', 'ireland', 'CZ', 'India', 'USA']

Now like if want to get google news data and store all data by country in a Dataframe.

list_country = []
df = pd.DataFrame([])
for country in tqdm(countries1): 
    googlenews = GoogleNews(start=Start_date,end=End_date)
    googlenews.set_lang('en')
    googlenews.set_encode('utf-8')
    googlenews.get_news(country)
    googlenews.total_count()
    result=googlenews.result()
    data=pd.DataFrame(result)
    df = df.append(data)
    df['Country'] = country
    list_country.append(country)

df.head()

Expected output like:

enter image description here

But in result only store last country's result.

Nirav Prajapati
  • 265
  • 2
  • 15
  • Have you tried to remove `tqdm` and iterate over `countries` list directly, just to check? – Laurent Mar 26 '21 at 14:40
  • please avoid using loops and appending to dataframe. It is not advised. See this link. I also recommend that we don't suggest solutions that have for loop with a df.append() at the [end](https://stackoverflow.com/questions/13784192/creating-an-empty-pandas-dataframe-then-filling-it/56746204#56746204) as suggested by @cs95. – Joe Ferndz Mar 30 '21 at 05:46
  • Instead create a dictionary with country as key and all the values as column:value. Then create a dataframe. It will create the dataframe for you. – Joe Ferndz Mar 30 '21 at 05:49
  • 1
    @JoeFerndz Yes, did the same and getting the expected result. Thanks for the reply. – Nirav Prajapati Mar 30 '21 at 06:08

2 Answers2

0

it didn't store only the last country data; it's simply change all the strings in the country column to the last country. you can use iloc in the dataframe to limits this effect

import datetime
from tqdm.notebook import tqdm
from GoogleNews import GoogleNews
from newspaper import Article
import pandas as pd

countries = ['Malaysia', 'ireland', 'CZ', 'India', 'USA']

End_date = datetime.datetime.now().date()

delta = datetime.timedelta(days = 0)
Start_date = datetime.datetime.now().date() - delta

googlenews = GoogleNews(lang='en',
                        start=Start_date,
                        end=End_date,
                        encode = 'utf-8')

all_df = pd.DataFrame([])
for country in tqdm(countries): 
    googlenews.get_news(country)
    googlenews.total_count()
    print(googlenews.total_count())
    result=googlenews.result()
    data=pd.DataFrame(result)
    if all_df.empty:
        all_df = all_df.append(data)
        all_df['Country'] = country
    else:
        current_size = len(all_df)
        all_df = all_df.append(data)
        all_df.iloc[current_size:,8:] = country
    googlenews.clear()
Thulfiqar
  • 393
  • 7
  • 14
0

Before append data frame add current country in it, Create country column and add current country name for that search result.

import datetime
from tqdm.notebook import tqdm
from GoogleNews import GoogleNews
from newspaper import Article
import pandas as pd

Start_date = str((datetime.datetime.today()-datetime.timedelta(days=30)).strftime ('%d/%m/%Y'))
End_date = str(datetime.datetime.today().strftime ('%d/%m/%Y'))

list_country = []
df = pd.DataFrame([])
for country in tqdm(countries): 
    googlenews = GoogleNews(start=Start_date,end=End_date)
    googlenews.set_lang('en')
    googlenews.set_encode('utf-8')
    googlenews.get_news(country)
    googlenews.total_count()
    result=googlenews.result()
    data=pd.DataFrame(result)
    data["Country"] = country
    df = df.append(data)

#Check result     
df.shape
df[90:100]
Nirav Prajapati
  • 265
  • 2
  • 15