-1

How to replace the char "N" from the column "GID" in the same Row if any of the columns is empty

DataFile <- extract_tables("new.pdf",pages = c(87),
                           method = "stream", output = "data.frame", guess = TRUE)
DataFrame<-as.data.frame(DataFile)

#removing No. and A# from columns
df2<-subset(DataFrame, Group!="No." & Group!="A#") 

output:

GID    ColA    ColB 
1       2       2
2       3       4
3       5       4
4       6       5
5       6       5
NG1     8 
MG2     8       1
MG3     8       1
NG4     8 

Expected output:

GID    ColA    ColB 
1       2       2
2       3       4
3       5       4
4       6       5
5       6       5
G1     8       N
MG2     8       1
MG3     8       1
G4     8       N

DATA:

df1 <-  structure(list(GID = c("1", "2", "3", "4", "5", "NG1", "MG2", 
"MG3", "NG4"), ColA = c(2L, 3L, 5L, 6L, 6L, 8L, 8L, 8L, 8L), 
    ColB = c("2", "4", "4", "5", "5", "", "1", "1", "")), row.names = c(NA, 
-9L), class = "data.frame")
kumar
  • 5
  • 5

2 Answers2

2

In base R, you could try the following.

First, identify the rows where ColB is an empty character value, and store in a logical vector:

emp_rows <- df1$ColB == ""

Then, remove "N" in GID in those rows:

df1$GID[emp_rows] <- gsub("N", "", df1$GID[emp_rows])

And store "N" in ColB in the same rows:

df1$ColB[emp_rows] <- "N"

To generalize for any column that is blank, you can do the following. Based on the logic in the comment, first check if GID starts with "N". If it does, remove the "N", and then check all columns for blank values, and if blank, substitute with "N".

You can create a function to do this, and then use apply or other method to rowwise go through your data frame.

my_fun <- function(vec) {
  if (startsWith(vec[["GID"]], "N")) {
    vec[["GID"]] <- gsub("N", "", vec[["GID"]])
    vec <- replace(vec, vec == "", "N")
  }
  return(vec)
}

data.frame(t(apply(df1, 1, my_fun)))

Output

  GID ColA ColB
1   1    2    2
2   2    3    4
3   3    5    4
4   4    6    5
5   5    6    5
6  G1    8    N
7 MG2    8    1
8 MG3    8    1
9  G4    8    N
Ben
  • 28,684
  • 5
  • 23
  • 45
  • Thank You, but I should not specifically mention the empty row for ColB. If any rows of the Column GID has the value which starts with "N" it should automatically replace the same row with any Column if it is empty – kumar Feb 04 '21 at 14:47
  • @kumar Please see edited answer, let me know if this is what you had in mind. – Ben Feb 04 '21 at 15:28
  • 1
    Like that - @kumar although this is a great solution and shows some nice and clear code using base R only, I would like to point out that what your are getting is character columns. This might intentional, but I get the feeling that it might not be what you actually want. If you want to keep your value as a "number" (an integer), I'd replace empty values with NA and create a separate column with your String – tjebo Feb 04 '21 at 19:24
  • 1
    Agree with @tjebo, using NA would likely be a better approach and worth consideration depending on your needs. Once you start adding "N" to "ColA" or "ColB" they will necessarily need to be character and cannot be integer. – Ben Feb 04 '21 at 19:40
0

By this way you can replace empty char with N or any other character if your choice without mentioning column name

library(tidyverse)

df1 <- structure(list(GID = c("1", "2", "3", "4", "5", "NG1", "MG2", "MG3", "NG4"), ColA = c(2L, 3L, 5L, 6L, 6L, 8L, 8L, 8L, 8L), ColB = c("2", "4", "4", "5", "5", "", "1", "1", "")), row.names = c(NA, -9L), class = "data.frame")

df1 %>% 
  mutate(across(everything(), ~str_replace(., "^$", "N")),
         GID = GID %>% str_remove("N"))
#>   GID ColA ColB
#> 1   1    2    2
#> 2   2    3    4
#> 3   3    5    4
#> 4   4    6    5
#> 5   5    6    5
#> 6  G1    8    N
#> 7 MG2    8    1
#> 8 MG3    8    1
#> 9  G4    8    N

Created on 2021-02-05 by the reprex package (v0.3.0)