my question is best asked in 2 parts:
I am dealing with a dataset that looks at forest product usage across many countries. Each row represents a household from any one of these countries (about 30 total). Each country has a code (4 digits), but in the dataset there is no column for country code. The way you can deduce which households came from which country is by using the household ID ("ghousehold"). Ghousecode is a 7-digit code, the first 4 digits being the country code. For example, if Bolivia were country code: 3024, then a household in Bolivia could be 3024105 or 3024999...
I want to have a code that selects all the entries for a specific country. I am using the tidyverse, so I thought of using select() and num_range() but it hasn't worked. I don't get an error message, but when I look at my output I can tell it hasn't worked. Here is my current code:
#forest_use_tibble is a tibble with observations on forest usage from many countries
#I selected a subset of the original file's variables.
forest_use_simpler <- select(forest_use_tibble, ghousecode, year, product, income, amount, unit)
#take Bolivia, whose country ID is 3024. This means that each ghousecode that begins with
3024 is from Bolivia.
#but each ghousecode is 3024xxx with three other numbers after it.
x = 3024
Bolivia <- select(forest_use_simpler, num_range("x", 001:999), everything())
#my goal: a new tibble/dataframe that has only the entries from Bolivia
#there is no separate column for country ID, unfortunately.
Any ideas?
Second part of the question: Is there a way to query just one of the columns (i.e. variables, in this case ghousecode) for the num_range? The way I have it above strikes me like it would search all variables in forest_use_simpler, so there is a chance that it may include another country's household if the digits 3024 appeared somewhere other than ghousecode.
Thank you!
(note: i have also tried putting in 3024 directly where x is to no avail. Thanks again for all help.)