@GiantsLoveDeathMetal has good points. In principle, you can read the raw data in as oecd_bli
and select subsets of the DataFrame that satisfy certain conditions.
Demo
import pandas as pd
# Given a DataFrame of raw data
d = {
"Country": pd.Series(["Australia", "Austria", "Fiji", "Japan"]),
"Indicator": pd.Series(["Dwellings ...", "Dwellings ...", "Life ...", "Life ..."]),
"Value": pd.Series([1.1, 1.0, 2.2, 2.9]),
}
oecd_bli = pd.DataFrame(d, columns=["Country", "Indicator", "Value"] )
oecd_bli

# Select rows starting with "Life" in column "Indicator"
condition = oecd_bli["Indicator"].str.startswith("Life")
subset = oecd_bli[condition]
subset

Alternatively, select a subset using label indexing via .loc
:
subset = oecd_bli.loc[condition, :]
Here loc
expects [<rows>, <columns>]
. Thus, those rows that meet the condition are displayed.
Details
Notice a view of the DataFrame is presented for every row that gives a True
condition. This is because the DataFrame responds to the boolean arrays.
Example of a boolean array:
>>> condition = oecd_bli["Indicator"].str.startswith("Life")
>>> condition
0 False
1 False
2 True
3 True
Name: Indicator, dtype: bool
Other ways to setup conditions:
>>> condition = oecd_bli["Indicator"] == "Life ..."
>>> condition = ~oecd_bli["Indicator"].str.startswith("Dwell")
>>> condition = oecd_bli["Indicator"].isin(["Life ...", "Crime ..."])
>>> condition = (oecd_bli["Indicator"] == "Life ...") | (oecd_bli["Indicator"] == "Crime ...")
- direct equality (
==
)
- exclude (
~
) undesired occurrences
- include whitelisted columns via
isin
- multiple comparisons with logical bit operators (
|
, &
, etc.)