I have a dataframe called 'Cdata' It has multiple columns with multiple strings in each one of them I'm looking for a way to choose from each columns the expression I need an then sum up the numbers from the last one which contains numbers.
Here's the excel file:
i'm looking for a way to sort and sum it up. Example:
Select Cdata$Gender = "Female" & Cdata$Month = "2020-01" & Cdata$District = "North" & Cdata$Age = '25-34' &
Cdata$Religion = 'Christian'
and sum column Jobseekers for all these values. Then I need to plot this data and show the difference between christian unemployed females from the north and south, or the difference between march and april and do statistic tests.
here's an example of outcome:
Month District Age Gender Religion Occupation JobSeekers
2020-01 North 25-34 Female Christian Unprofessional workers 3258
I tried to explain it with minimum lines so it'll be informative and directly instead of long and clumsy. Please consider me as a newbie here and be merciful if I made any mistakes.
Here's the dput for structure:
structure(
list(
Month = c(
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01",
"2020-01"
),
District = c("Dan", "Dan", "Dan", "Dan",
"Dan", "Dan"),
Age = c("U17", "U17", "U17", "18-24", "18-24",
"18-24"),
Gender = c("Male", "Male", "Female", "Male", "Male",
"Male"),
Education = c("None", "None", "None", "None", "None",
"None"),
Disability = c("None", "None", "None", "None", "None",
"None"),
Religion = c("Jewish", "Muslims", "Other", "Jewish",
"Jewish", "Jewish"),
Occupation = c(
"Unprofessional workers",
"Sales and costumer service",
"Undefined",
"Production and construction",
"Academic degree",
"Practical engineers and technicians"
),
JobSeekers = c(2L,
1L, 1L, 1L, 1L, 1L),
GMI = c(0L, 0L, 0L, 0L, 0L, 0L),
ACU = c(0L,
0L, 0L, 0L, 0L, 0L),
NACU = c(2L, 1L, 1L, 1L, 1L, 1L),
NewSeekers = c(0L,
0L, 0L, 0L, 0L, 1L),
NewFiredSeekers = c(0L, 0L, 0L, 0L, 0L,
1L)
),
row.names = c(NA, 6L),
class = "data.frame"
)