-2

I pulled in a large .csv file with columns such as "paid" and "description"

I am trying to figure out how to only pull the "paid" column when the "description" is Bronchitis or some other illness that is in the column.

This would be like doing a pivot table in Excel and filtering only on a certain Description and receiving all of the individual paid rows.

 Paid Description  val 
 $500 Bronchitis   1.5
 $3,250 'Complication of Pregnancy/Childbirth' 2.2
 $5,400 Burns 3.3
 $20.50 Bronchitis 4.4
 $24  Ashtma 1.2
akrun
  • 874,273
  • 37
  • 540
  • 662
Mere
  • 29
  • 3
  • Please provide a small example data and the expected result based on that. What is `some other illness`? Please be specific. You can refer the link for how to make a reproducible example http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – akrun Jun 03 '15 at 15:51
  • ex data / Paid Description $500 Bronchitis $3,250 Complication of Pregnancy/Childbirth $5,400 Burns $20.50 Bronchitis I am trying to break out bronchitis so it would show as Paid Description $500 Bronchitis $20.50 Bronchitis And then I could just to data analysis on individual descriptions – Mere Jun 03 '15 at 15:53
  • It is not easy to get the correct format from the comment. Please use the edit button in your post and update it – akrun Jun 03 '15 at 15:57
  • Assuming your data frame is called `df`, `df.subset = df[df$description %in% c("Bronchitis","Asthma","COPD"), c("paid","description")]`. Just include whatever diseases are of interest. – eipi10 Jun 03 '15 at 15:57
  • Sounds like Package "readr" and functions "dplyr::filter" and "dplyr::select" might help. – Daniel Jun 03 '15 at 15:57
  • 2
    df[desc=="a",c("paid")] – rmuc8 Jun 03 '15 at 15:57
  • The df subset got it to work, thanks eipi10 and everyone else! – Mere Jun 03 '15 at 16:36
  • Using this type of code df.subset = df[df$description %in% c("Bronchitis","Asthma","COPD"), c("paid","description")] How would I further break it down by Description AND Paid, ex/ Bronchitis with Paid > $500 – Mere Jun 03 '15 at 16:58

2 Answers2

1

If your data is

paid <- c(300,200,150)
desc <- c("bronchitis","headache","broken.leg")
df <- data.frame(paid, desc)

Try

df[desc=="bronchitis",c("paid")]

# the argument ahead of the comma filters the row,
# the argument after the comma refers to the column

# > df[desc=="bronchitis",c("paid")]
# [1] 300

or

library(dplyr)
df %>% filter(desc=="bronchitis") %>% select(paid)

# filter refers to the row condition
# select filters the output column(s)


# > df %>% filter(desc=="bronchitis") %>% select(paid)
#   paid
# 1  300
rmuc8
  • 2,869
  • 7
  • 27
  • 36
1

Using data.table

library(data.table)#v1.9.5+
setkey(setDT(df1), Description)[.('Bronchitis'),'Paid', with=FALSE]
#    Paid
#1:   $500
#2: $20.50

data

df1 <- structure(list(ex = c("Description", "Bronchitis",
"Complication of Pregnancy/Childbirth", 
"Burns", "Bronchitis", "Ashtma"), data = c("val", "1.5", "2.2", 
"3.3", "4.4", "1.2")), .Names = c("ex", "data"), class = "data.frame",
row.names = c("Paid", "$500", "$3,250", "$5,400", "$20.50", "$24"))
akrun
  • 874,273
  • 37
  • 540
  • 662