R - .csv file - extract variables

Question

I pulled in a large .csv file with columns such as "paid" and "description"

I am trying to figure out how to only pull the "paid" column when the "description" is Bronchitis or some other illness that is in the column.

This would be like doing a pivot table in Excel and filtering only on a certain Description and receiving all of the individual paid rows.

 Paid Description  val 
 $500 Bronchitis   1.5
 $3,250 'Complication of Pregnancy/Childbirth' 2.2
 $5,400 Burns 3.3
 $20.50 Bronchitis 4.4
 $24  Ashtma 1.2

Please provide a small example data and the expected result based on that. What is `some other illness`? Please be specific. You can refer the link for how to make a reproducible example http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — akrun, Jun 03 '15 at 15:51
ex data / Paid Description $500 Bronchitis $3,250 Complication of Pregnancy/Childbirth $5,400 Burns $20.50 Bronchitis I am trying to break out bronchitis so it would show as Paid Description $500 Bronchitis $20.50 Bronchitis And then I could just to data analysis on individual descriptions — Mere, Jun 03 '15 at 15:53
It is not easy to get the correct format from the comment. Please use the edit button in your post and update it — akrun, Jun 03 '15 at 15:57
Assuming your data frame is called `df`, `df.subset = df[df$description %in% c("Bronchitis","Asthma","COPD"), c("paid","description")]`. Just include whatever diseases are of interest. — eipi10, Jun 03 '15 at 15:57
Sounds like Package "readr" and functions "dplyr::filter" and "dplyr::select" might help. — Daniel, Jun 03 '15 at 15:57
The df subset got it to work, thanks eipi10 and everyone else! — Mere, Jun 03 '15 at 16:36
Using this type of code df.subset = df[df$description %in% c("Bronchitis","Asthma","COPD"), c("paid","description")] How would I further break it down by Description AND Paid, ex/ Bronchitis with Paid > $500 — Mere, Jun 03 '15 at 16:58

rmuc8 · Answer 1 · 2015-06-03T16:07:06.707

If your data is

paid <- c(300,200,150)
desc <- c("bronchitis","headache","broken.leg")
df <- data.frame(paid, desc)

Try

df[desc=="bronchitis",c("paid")]

# the argument ahead of the comma filters the row,
# the argument after the comma refers to the column

# > df[desc=="bronchitis",c("paid")]
# [1] 300

or

library(dplyr)
df %>% filter(desc=="bronchitis") %>% select(paid)

# filter refers to the row condition
# select filters the output column(s)


# > df %>% filter(desc=="bronchitis") %>% select(paid)
#   paid
# 1  300

score 1 · Answer 2 · answered Jun 03 '15 at 16:07

Using data.table

library(data.table)#v1.9.5+
setkey(setDT(df1), Description)[.('Bronchitis'),'Paid', with=FALSE]
#    Paid
#1:   $500
#2: $20.50

data

df1 <- structure(list(ex = c("Description", "Bronchitis",
"Complication of Pregnancy/Childbirth", 
"Burns", "Bronchitis", "Ashtma"), data = c("val", "1.5", "2.2", 
"3.3", "4.4", "1.2")), .Names = c("ex", "data"), class = "data.frame",
row.names = c("Paid", "$500", "$3,250", "$5,400", "$20.50", "$24"))

R - .csv file - extract variables

2 Answers2

data