0

I'm facing the following problem: I need to create a graph and a table with information about scholarity and profession for different years and regions (7 years and 5 regions).

I have 4 levels of scholarity (fundinc, medioinc, superiorinc and supdout) and 3 levels of profession (apoio, operacional and estrategico).

Each level is a column (if fundinc == 1, the others are 0, and if apoio == 1, operacional and estrategico are both 0).

The data base is separated by year and region (data2010nordeste, data2010norte, data2010centro, data2010sudeste, data2010sul, ..., data2016nordeste, data2016norte, data2016centro, data2016sudeste, data2016sul).

The db's is something like:

fundinc | medioinc | superiorinc | supdout | apoio | operacional | estrategico
1       | 0        | 0           | 0       | 1     | 0           | 0
0       | 1        | 0           | 0       | 0     | 1           | 0
0       | 0        | 1           | 0       | 0     | 0           | 1
0       | 0        | 1           | 0       | 0     | 0           | 1
0       | 1        | 0           | 0       | 1     | 0           | 0
1       | 0        | 0           | 0       | 1     | 0           | 0
.
. 
.

Any suggestion ? I'm totally lost.

I tried to create a function:

pegaescolaridadeapoio = function (base) {

#Fundamental incompleto

a <- base[base$fundinc==1 & base$apoio==1, ]

#Medio incompleto

b <- base[base$medioinc==1 & base$apoio==1,]

#Superior incompleto

c <- base[base$superiorinc==1 & base$apoio==1,]

#superior e outros

d <- base[base$supdout==1 & base$apoio==1,]

vetor <- c(nrow(a),nrow(b),nrow(c),nrow(d))

return (vetor)
}

And some vectors to put it on a graph / table, but I had no success.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
RxT
  • 486
  • 7
  • 17
  • showing some input and output data (e.g. posting the result of `dput(your_data)`) would improve your question and will significantly increase the chance to get help. hint the `tidyverse` is your friend here. – Roman Apr 10 '18 at 15:50
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Apr 10 '18 at 15:51
  • @Jimbou Hi, thanks ! I work with a lot of data, but I did put an exemple (edited) – RxT Apr 10 '18 at 16:09
  • @MrFlick sorry, but I'm beginner, I don't think that anything I tried was relevant :( – RxT Apr 10 '18 at 16:10
  • @RicardoTheodoro where are the different years and regions? And where is your expected output. Please revise. – Roman Apr 10 '18 at 16:13
  • @Jimbou the regions are at database ! each data base is a region and a year. I want to see how many "fundinc" are in "apoio", "operacional" and "estrategico", for exemple. And I dont know which graph shows it better, maybe a regular barplot. – RxT Apr 10 '18 at 16:28

1 Answers1

0

@RichardoTheodoro

Based on your requirements, I recommend simplifying your data as such

library(dplyr)
dat.clean <- dat %>% 
# Convert your columns into a single column, stored as factor variable
mutate(scholarity = factor(1 * fundinc + 2 * medioinc + 3 * superiorinc + 4 * supdout,
                           levels = c(1, 2, 3, 4), 
                           labels = c("fundinc", "medioinc", "superiorinc", "supdout")), 
       profession = factor(1 * apoio + 2 * operacional + 3 * estrategico, 
                           levels = c(1, 2, 3), 
                           labels = c("apoio", "operacional", "estragico"))) %>%
# Remove columns which will no longer be used
select(-fundinc, -medioinc, -superiorinc, -supdout, -apoio, -operacional, -estrategico)

Then you can proceed to do other data manipulation for your counting/graphing/charting as such:

# Group entries by scholarity and profession, then count the frequency of occurrence
dat.processed <- dat.clean %>%
  group_by(scholarity, profession) %>%
  mutate(freq = n()) %>%
  ungroup() 

# Plot bar chart
library(ggplot2)
ggplot(dat.processed, aes(scholarity, freq, fill = profession)) +
  geom_bar(stat = "identity", position = "dodge")
whalea
  • 301
  • 1
  • 7
  • Thanks! I want to see how many "fundinc" are in "apoio", "operacional" and "estrategico", for exemple. And I dont know which graph shows it better, can you sugest something ? – RxT Apr 10 '18 at 17:18
  • I recommend a bar chart for comparison, and I have added the code in the edits above – whalea Apr 10 '18 at 17:29
  • Thank you so much ! – RxT Apr 10 '18 at 17:50