0

I have a huge dataset of the Marseille's rental property market (named marseilleannonces) which contains some variables:

structure(list(ID = c("af626000-342e-11e8-a56e-8326540c0e87", 
"20629290-c926-11e6-a626-abf6d3bf8a25", "8495af50-b92c-11e5-86ef-abf6d3bf8a25", 
"a4299b60-11e3-11ea-9589-c1180fadeaa5", "833f81d0-d3da-11ea-b28a-1b6a75606a9a", 
"75358b40-6d76-11e5-bb7a-cfb08fbdec46", "8d6f22f3-abc7-11e4-b16a-1100e6029c1e", 
"10ed2580-28cb-11e9-bcd9-d3a30a46a7fe", "dd156b70-1534-11e6-afdf-abf6d3bf8a25", 
"15688650-2934-11e8-ab89-41d65c7c6457"), TYPE = c("APARTMENT", 
"APARTMENT", "APARTMENT", "APARTMENT", "PREMISES", "APARTMENT", 
"APARTMENT", "APARTMENT", "APARTMENT", "PREMISES"), SURFACE = c(19, 
29, 17, 55, 35, 50, 67, 30, 28, 45), ROOM_COUNT = c(1, 2, 1, 
3, 1, 2, 2, 1, 1, NA), PRICE = c(295, 470, 290, 610, 550, 500, 
500, 655, 445, 1943), RENTAL_EXPENSES = c(45, NA, NA, NA, NA, 
NA, 40, NA, NA, NA), RENTAL_EXPENSES_INCLUDED = c(TRUE, TRUE, 
NA, TRUE, TRUE, TRUE, TRUE, TRUE, NA, NA)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

In this dataset, if RENTAL_EXPENSES_INCLUDED=TRUE, the variable PRICE contains the values in RENTAL_EXPENSES, and if RENTAL_EXPENSES_INCLUDED=FALSE, the variable PRICE does not contain the values in RENTAL_EXPENSES. My goal is to create a new column whith prices that does not contain the values in RENTAL_EXPENSES, named HC. I tried to create a function:

for(i in 1:length(marseilleannonces$RENTAL_EXPENSES_INCLUDED)){
  x = marseilleannonces$RENTAL_EXPENSES_INCLUDED[i]
  if(x == TRUE){
    marseilleannonces$HC[i] = PRICE[i]-RENTAL_EXPENSES[i]
  }
  else {
    marseilleannonces$HC[i] = PRICE[i]
  }
}

R tells me that there is a missing value where TRUE/FALSE is required. Maybe the fact that there is a lot of NAs in my dataset is a problem. Any advice is the right direction is welcomed.

Thanks in advance !

  • I think the NAs are the problem indeed, try to handle them in your `if` or remove them from your data set. See [this](https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame) for the latter. – Alexandre Marcq Jul 15 '21 at 07:31
  • Can you edit your post to include the expected output for this dataset? – Ronak Shah Jul 16 '21 at 01:58

1 Answers1

0

Edit: Based on your comments:

marseillannonces %>% 
  mutate(HC = case_when(RENTAL_EXPENSES_INCLUDED == TRUE ~ PRICE - RENTAL_EXPENSES,
                        RENTAL_EXPENSES_INCLUDED == FALSE ~ PRICE))
MonJeanJean
  • 2,876
  • 1
  • 4
  • 20
  • Thanks for your response. My goal is to have a column with the price of the property without the rental expenses. In this dataset, some prices are with rental expenses, some are without, and for some we don't know. Thus, NAs must remain NAs. – MonsieurDjuna Jul 15 '21 at 08:27
  • I edited my answer. When the `RENTAL_EXPENSES_INCLUDED` is `TRUE`, `HC` is `PRICE - RENTAL_EXPENSES` ; if `RENTAL_EXPENSES_INCLUDED ` is `FALSE`, then `HC` is just `PRICE` ; when `RENTAL_EXPENSES_INCLUDED ` is `NA`, `HC` is `NA` too – MonJeanJean Jul 15 '21 at 08:34