-1

I have this code that is getting all data from CDC. However, what I want is to get the data starting at a specific date, like all data after 04/03/2022 for example. Is it possible to do that?

#Source: https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-County/8xkx-amqh/

urlData = requests.get('https://data.cdc.gov/api/views/8xkx-amqh/rows.csv?accessType=DOWNLOAD').content

# Convert to pandas DataFrame
vcounty_df = pd.read_csv(io.StringIO(urlData.decode('utf-8')))

studentpr
  • 43
  • 7

1 Answers1

0

The server seems to serve a CSV file, so it will be difficult to only download part of the data that you want. You can try to filter every line on the fly, but the entire file will still be transferred across the Internet.

A more practical approach is to post process the data by filtering for the date range that you want. Here is how to do it.

# Create a DataFrame with "Date" column as an index
# and we will be filtering one this index. The
# dtype={"FIPS": "str"} is to suppress the mix dtype warning
# on that FIPS column.
df = pd.read_csv(
    io.StringIO(urlData.decode("utf-8")),
    parse_dates=["Date"],
    dtype={"FIPS": "str"},
    index_col=0,
)

# Filter for only 2022-04-03.
# Taken from https://stackoverflow.com/questions/22898824/filtering-pandas-dataframes-on-dates
start_from_april = df.loc["2022-04-03":]