1

The data here is web-scraped from a website, and this initial data in the variable 'r' has three columns, where there are three columns: 'Country', 'Date', '% vs 2019 (Daily)'. From these three columns I was able to extract only the ones I wanted from dates: "2021-01-01" to current/today. What I am trying to do (have spent hours), is trying to organize the data in such a way where there is one column with just the dates which correspond to the percentage data, then 4 other columns which are the country names: Denmark, Finland, Norway, Sweden. Underneath those four countries should have cells populated with the percent data. Have tried using [], loc, and iloc and various other combinations to filter the panda dataframes in such a way to make this happen, but to no avail.

Here is the code I have so far:

import requests
import pandas as pd 
import json
import math
import datetime
from jinja2 import Template, Environment

from datetime import date 


r = requests.get('https://docs.google.com/spreadsheets/d/1GJ6CvZ_mgtjdrUyo3h2dU3YvWOahbYvPHpGLgovyhtI/gviz/tq?usp=sharing&tqx=reqId%3A0output=jspn')

data = r.content

data = json.loads(data.decode('utf-8').split("(", 1)[1].rsplit(")", 1)[0])

d = [[i['c'][0]['v'], i['c'][2]['f'], (i['c'][5]['v'])*100 ] for i in data['table']['rows']]


df = pd.DataFrame(d, columns=['Country', 'Date', '% vs 2019 (Daily)'])


pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)


# EXTRACTING BETWEEN TWO DATES

df['Date'] = pd.to_datetime(df['Date'])

startdate = datetime.datetime.strptime('2021-01-01', "%Y-%m-%d").date()
enddate = datetime.datetime.strptime('2021-02-02', "%Y-%m-%d").date()

pd.Timestamp('today').floor('D')

df = df[(df['Date'] > pd.Timestamp(startdate).floor('D')) & (df['Date'] <= pd.Timestamp(enddate).floor('D'))]


Den = df.loc[df['Country'] == 'Denmark']

Fin = df.loc[df['Country'] == 'Finland']

Swe = df.loc[df['Country'] == 'Sweden']

Nor = df.loc[df['Country'] == 'Norway']


Den_data = Den.loc[: , "% vs 2019 (Daily)"]
Den_date = Den.loc[: , "Date"]

Nor_data = Nor.loc[: , "% vs 2019 (Daily)"]
Swe_data = Swe.loc[: , "% vs 2019 (Daily)"]

Fin_data = Fin.loc[: , "% vs 2019 (Daily)"]
Fin_date = Fin.loc[: , "Date"]



Den_data = Den.loc[: , "% vs 2019 (Daily)"]



df2 = pd.DataFrame()

df2['DEN_DATE'] = Den_date
df2['DENMARK'] = Den_data


df3 = pd.DataFrame()
df3['FIN_DATE'] = Fin_date
df3['FINLAND'] = Fin_data

Want it to be organized like this so I can eventually export it to excel:

Date       | Denmark | Finland| Norway | Sweden

2020-01-01 | 1234    | 4321   | 5432   | 6574

...

Any help is greatly appreicated. Thank you

  • Please supply the expected [minimal, reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) (MRE). Show where the intermediate results differ from what you expected. We should be able to copy and paste a contiguous block of your code, execute that file, and reproduce your problem along with tracing output for the problem points. This lets us test our suggestions against your test data and desired output. Your posted code is not minimal: you've left in manipulations that aren't part of your immediate problem. – Prune Feb 08 '21 at 23:00
  • Please [include a minimal data frame](https://stackoverflow.com/questions/52413246/how-to-provide-a-reproducible-copy-of-your-dataframe-with-to-clipboard) as part of your MRE. – Prune Feb 08 '21 at 23:01

1 Answers1

0

Use isin to filter only the countries you are interested in getting the data. Then use pivot to return a reshaped dataframe organized by a given index and column values, in this case the index is the Date column, and the column values are the countries from the previous selection.

...
...
pd.Timestamp('today').floor('D')

df = df[(df['Date'] > pd.Timestamp(startdate).floor('D')) & (df['Date'] <= pd.Timestamp(enddate).floor('D'))]

countries_list=['Denmark', 'Finland', 'Norway', 'Sweden']

countries_selected = df[df.Country.isin(countries_list)]
result = countries_selected.pivot(index="Date", columns="Country")
print(result)

Output from result

           % vs 2019 (Daily)
Country              Denmark    Finland     Norway     Sweden
Date
2021-01-02        -65.261383 -75.416667 -39.164087 -65.853659
2021-01-03        -60.405405 -77.408056 -31.763620 -66.385669
2021-01-04        -69.371429 -75.598086 -34.002770 -70.704467
2021-01-05        -73.690932 -79.251701 -33.815689 -73.450509
2021-01-06        -76.257310 -80.445151 -43.454791 -80.805484
...
...
2021-01-30        -83.931624 -75.545852 -63.751763 -76.260163
2021-01-31        -80.654339 -74.468085 -55.565777 -65.451895
2021-02-01        -81.494253 -72.419106 -49.610390 -75.473322
2021-02-02        -81.741233 -73.898305 -46.164021 -78.215223
n1colas.m
  • 3,863
  • 4
  • 15
  • 28