Based on your description, I assume your data looks something like this:
country_year <- c("Australia_2013", "Australia_2014", "Bangladesh_2013")
health <- matrix(nrow = 3, ncol = 3, data = runif(9))
dataset <- data.frame(rbind(country_year, health), row.names = NULL, stringsAsFactors = FALSE)
dataset
# X1 X2 X3
#1 Australia_2013 Australia_2014 Bangladesh_2013
#2 0.665947273839265 0.677187719382346 0.716064820764586
#3 0.499680359382182 0.514755881391466 0.178317369660363
#4 0.730102791683748 0.666969108628109 0.0719663293566555
First, move your row 1 (e.g., Australia_2013, Australia_2014 etc.) to the column names, and then apply the loop to create country-based data frames.
library(dplyr)
# move header
dataset2 <- dataset %>%
`colnames<-`(dataset[1,]) %>% # uses row 1 as column names
slice(-1) %>% # removes row 1 from data
mutate_all(type.convert) # converts data to appropriate type
# apply loop
for(country in unique(gsub("_\\d+", "", colnames(dataset2)))) {
assign(country, select(dataset2, starts_with(country))) # makes subsets
}
Regarding the loop,
gsub("_\\d+", "", colnames(dataset2))
extracts the country names by replacing "_[year]" with nothing (i.e., removing it), and the unique()
function that is applied extracts one of each country name.
assign(country, select(dataset2, starts_with(country)))
creates a variable named after the country and this country variable only contains the columns from dataset2
that start with the country name.
Edit: Responding to Comment
The question in the comment was asking how to add row-wise summaries (e.g., rowSums()
, rowMeans()
) as new columns in the country-based data frames, while using this for-loop.
Here is one solution that requires minimal changes:
for(country in unique(gsub("_\\d+", "", colnames(dataset2)))) {
assign(country,
select(dataset2, starts_with(country)) %>% # makes subsets
mutate( # creates new columns
rowSums = rowSums(select(., starts_with(country))),
rowMeans = rowMeans(select(., starts_with(country)))
)
)
}
mutate()
adds new columns to a dataset.
select(., starts_with(country))
selects columns that start with the country name from the current object (represented as .
in the function).