0

I would like to change an NA value on the third observation of companyID 1 in column 2 (value) to the character "F". My dataset is huge, so will not be able to say which exact row this observation is on but I will know that it is on the third observation of the company ID for example. I hope this makes sense. Then, I would like fourth observation of companyID 2 in column 2 (value) to be "X". Then, I would like the second observation of companyID 3 in column 2 (value) to be "B", and so on. Each value is different and will lie on a different row of a companyID.

I found the following question which is somewhat similar, but it is unable to give me what I want How to replace certain values in a specific rows and columns with NA in R? .

Here is an example of what I have:

companyID   value
    1        NA
    1        NA
    1        NA
    1        NA
    1        NA
    2        NA
    2        NA
    2        NA
    2        NA
    2        NA
    3        NA
    3        NA
    3        NA
    3        NA
    3        NA

And I would like to get the following output:

companyID   value
    1        NA
    1        NA
    1         F
    1        NA
    1        NA
    2        NA
    2        NA
    2        NA
    2         X
    2        NA
    3        NA
    3         B
    3        NA
    3        NA
    3        NA

Super appreciative of your help!

3 Answers3

2

Try something like this. Add in a rowID and then create a lookup data frame with your replacements. Then you can just left_join() in the new values.

The lookup_df replaces your "if this company and this row then new value" logic.

library(dplyr)

df <- tibble(companyID = c(rep(1, 5), rep(2, 6)), value = NA_character_)

lookup_df <- tibble(companyID = c(1, 2), rowID = c(3, 4), valueNew = c("F", "D"))

df %>% 
  group_by(companyID) %>% 
  mutate(rowID = row_number()) %>% 
  left_join(lookup_df, by = c("companyID", "rowID")) %>% 
  mutate(value = coalesce(value, valueNew)) %>% 
  select(companyID, value)

result:

# A tibble: 11 x 2
# Groups:   companyID [2]
   companyID value
       <dbl> <chr>
 1         1 NA   
 2         1 NA   
 3         1 F    
 4         1 NA   
 5         1 NA   
 6         2 NA   
 7         2 NA   
 8         2 NA   
 9         2 D    
10         2 NA   
11         2 NA  
0

Are you looking at a solution like this:

library(dplyr)
df %>% group_by(companyID) %>% 
  mutate(value = case_when(row_number() == 3 ~ 'F', TRUE ~ value))
# A tibble: 10 x 2
# Groups:   companyID [2]
   companyID value
       <dbl> <chr>
 1         1 NA   
 2         1 NA   
 3         1 F    
 4         1 NA   
 5         1 NA   
 6         2 NA   
 7         2 NA   
 8         2 F    
 9         2 NA   
10         2 NA   
Karthik S
  • 11,348
  • 2
  • 11
  • 25
  • and how would it be if I only wanted that outcome for companyID 1 and then there would potentially be another condition for companyID 2? thanks for your response! – coding_sailor Mar 29 '21 at 15:46
  • @coding_sailor, you mean 3rd row for one companyID and another row for another companyID? – Karthik S Mar 29 '21 at 15:48
  • yes, but I will do all these individually, because they all have different values and are in different spots for the different companyIDs. Therefore I would need to be able to code that "if the company ID is 1, and in that companyIDs third observation, I would like to change it to F in the value column". does this makes sense? I can try and edit the output above to give a more detailed example – coding_sailor Mar 29 '21 at 15:52
  • @coding_sailor, expanding your sample data and expected output would be helpful. – Karthik S Mar 29 '21 at 15:54
  • I have made edits in the question above :) – coding_sailor Mar 29 '21 at 16:03
0

A solution using data.table

library(data.table)
df <- read.table(header = TRUE,text="companyID   value
    1        NA
    1        NA
    1        NA
    1        NA
    1        NA
    2        NA
    2        NA
    2        NA
    2        NA
    2        NA
    3        NA
    3        NA
    3        NA
    3        NA
    3        NA")
setDT(df)
df[, value := as.character(value)] # Converting column to character
df[, dummy := 1:.N, companyID] # A dummy column with indices per companyID
mapply(function(x, y, z) { # "multi"-applying replacement function
  df[companyID == x & dummy == y, value := z]
  NULL},
       c(1, 2, 3), # x are the companyIDs
       c(3, 4, 2), # y are dummy indices to be replaced
       c("F", "X", "B") # z are the replacements
)
df$dummy <- NULL #Bye dummy variable
df

   companyID value
 1:         1  <NA>
 2:         1  <NA>
 3:         1     F
 4:         1  <NA>
 5:         1  <NA>
 6:         2  <NA>
 7:         2  <NA>
 8:         2  <NA>
 9:         2     X
10:         2  <NA>
11:         3  <NA>
12:         3     B
13:         3  <NA>
14:         3  <NA>
15:         3  <NA>
Chriss Paul
  • 1,101
  • 6
  • 19