-1

I am working on a question to assign new labels to the old ones. The question asks me to

  1. Add a label '-H1' to every Whole Foods Market at zipcode 94107
  2. Add a label '-H2' to every Safeway at zipcode 94107
  3. Add a label '-H3' to every Pizzeria Delfina at zipcode 94110

Below is what I have done but it shows error message as "In if (problem$pickup_zipcode == 94107 & problem$pickup_name == : the condition has length > 1 and only the first element will be used"

enter image description here

I guess I can not use if because it will not proceed to the next statement? What about "for" instead?

Uwe
  • 41,420
  • 11
  • 90
  • 134
Nan
  • 19
  • 3
  • See [how to create a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Please do not post code as an image. While we can't test to see if it's the only problem, you should not be using the `assign()` function at all in this case. Use `<-` to assign a value to a variable (`==` is used to test for equality, not assignment). – MrFlick Nov 22 '17 at 19:56
  • You shouldn't use assign for this purpose. – JeanVuda Nov 22 '17 at 20:02

2 Answers2

0

The other answer posted so far claims to be a data.table approach but replaces every single item hardcoded. Therefore, I feel obliged to post an alternative solution which uses a lookup table and an update on join:

library(data.table)

# read data from google drive
DT <- fread("https://drive.google.com/uc?id=1DEdJvAdACVv_Pc5IcgFBSGvDKm_GPrNE&export=download")

# create lookup table
lookup <- data.table(pickup_name = c("Safeway", "Whole Foods Market", "Pizzeria Delfina"),
                     pickup_zipcode = c(94107, 94107, 94110),
                     label = c("-H2", "-H1", "-H3")
)

# join with lookup table and update on join
DT[lookup, on = .(pickup_name, pickup_zipcode), pickup_name := paste0(pickup_name, label)]

# verify data are updated
DT[pickup_name %like% "-H.$", .(pickup_name, pickup_zipcode)]
                pickup_name pickup_zipcode
   1:            Safeway-H2          94107
   2: Whole Foods Market-H1          94107
   3:            Safeway-H2          94107
   4: Whole Foods Market-H1          94107
   5: Whole Foods Market-H1          94107
  ---                                     
2003:            Safeway-H2          94107
2004: Whole Foods Market-H1          94107
2005:            Safeway-H2          94107
2006:   Pizzeria Delfina-H3          94110
2007: Whole Foods Market-H1          94107

fread reads data directly from Google Drive using this hint. DT has about 60 K rows and 22 columns (about 9 MB on disk).

Uwe
  • 41,420
  • 11
  • 90
  • 134
-1

Here is a data.table approach. You may have to install data.table library:

library(data.table)
dat<-data.table(problem)
setkey(dat, pickup_zipcode, pickup_name)
dat[J(94107, "Safeway"), pickup_name:="Safeway-H2"]
setkey(dat, pickup_zipcode, pickup_name)
dat[J(94107, "Whole Foods Market"), pickup_name:="Whole Foods Market-H1"]
setkey(dat, pickup_zipcode, pickup_name)
dat[J(94110, "Pizzeria Delfina"), pickup_name:="Pizzeria Delfina-H3"]
Uwe
  • 41,420
  • 11
  • 90
  • 134
JeanVuda
  • 1,738
  • 14
  • 29
  • Hi Jean. I typed what you showed up there and changed the variable names. But it says that it could not find the function. setkey(problem, pickup_zipcode, pickup_name) problem[J(94107, "Safeway"), pickup_name:="Safeway-H2"] problem[J(94107, "Whole Foods Market"), pickup_name:="Whole Foods Market-H1"] problem[J(94110, "Pizzeria Delfina"), pickup_name:="Pizzeria Delfina-H3"] – Nan Nov 22 '17 at 20:09
  • You need to install data.table package. Try running this first: install.packages("data.table", dep=T); library(data.table) – JeanVuda Nov 22 '17 at 20:12
  • I ran all the codes and the it says "Error in `[.data.table`(dat, J(94107, "Whole Foods Market"), `:=`(pickup_name, : When i is a data.table (or character vector), the columns to join by must be specified either using 'on=' argument (see ?data.table) or by keying x (i.e. sorted, and, marked as sorted, see ?setkey). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM." – Nan Nov 22 '17 at 20:23
  • I don't understand the problem. Can you run dput(head(problem)) and paste the output here: So we'll have a sample data to see the error. – JeanVuda Nov 22 '17 at 20:29
  • It was quite long but the headers are: customer_price courier_price, day_of_week_local service, market num_items, pickup_zipcode, distance_pickup_to_dropoff_km, purchase_price avg_courier_rating, status duration reassigned dropoff_zipcode, rating_by_customer purchase_feevehicle_type, pickup_name purchase_tip credit_applied, date_created_local,rating_by_courier – Nan Nov 22 '17 at 20:34
  • I can't think of any reasons why the code I provided wouldn't work. Can you check class(dat[,pickup_zipcode]) to see if it is integer/numeric, and also class(dat[,pickup_name]) to see if it is character. Like MrFlick mentioned, without a reproducible example we can't troubleshoot your code. – JeanVuda Nov 22 '17 at 20:38
  • Hi Jean. I am afraid that I was not super clear on the question. So below are the links to the requirement of the question and dataset. Really appreciate you being so patient! – Nan Nov 22 '17 at 20:41
  • data - https://drive.google.com/open?id=1DEdJvAdACVv_Pc5IcgFBSGvDKm_GPrNE – Nan Nov 22 '17 at 20:42
  • requirement https://drive.google.com/open?id=119axqgvRZikM-nfV0C8MFl6RyEXgFXYC. Let me go ahead check and read the link MrFlick gave. Thank you! – Nan Nov 22 '17 at 20:43
  • Yes. The first one is zip code is numeric(integer) and the name is character. – Nan Nov 22 '17 at 20:46
  • I have edited my answer. Try running the code now - it looks like setkey has to be run everytime after assigning a change, which I wasn't aware!! – JeanVuda Nov 22 '17 at 20:50
  • @JeanVuda Best practice is to use the `on =` argument, even if you're sure the data.table has a key. This prevents odd errors from popping up (like above) and makes the code more explicit in what's happening. – Nathan Werth Nov 22 '17 at 20:54
  • @Nathan Werth, I am not sure how to use on argument. Do you have an example? – JeanVuda Nov 22 '17 at 21:00
  • `dat[J(94107, "Safeway"), on = c("pickup_zipcode", "pickup_name"), pickup_name:="Safeway-H2"]`. Just a character vector doing the same job as the key, but without the assumption a key exists. If the key does exist, then it will be used, so setting keys still gives a performance boost. – Nathan Werth Nov 22 '17 at 21:06
  • That setting key thing seems a glitch. Because I was able to run this without getting any error. But I get your point and it is much nicer code too. dt <- data.table(c(1,2,3,4,5),c("chr1","chr1","chr2","chr3","chr4"),c(12,12,13,14,15)); setkey(dt, V1, V2); dt[J(2, "chr1"),V3:=20] ; dt[J(1, "chr1"),V3:=10] ; dt[J(5, "chr4"),V3:=40] ; – JeanVuda Nov 22 '17 at 21:07
  • I edited the code as suggested and it looks like this now library(data.table) dat<-data.table(problem) setkey(dat, pickup_zipcode, pickup_name) dat[J(94107, "Safeway"), on = c("pickup_zipcode", "pickup_name"), pickup_name:="Safeway-H2"] – Nan Nov 22 '17 at 21:49