how to replace the "N" in the Same Row if any of the columns is empty in R programming

Question

How to replace the char "N" from the column "GID" in the same Row if any of the columns is empty

DataFile <- extract_tables("new.pdf",pages = c(87),
                           method = "stream", output = "data.frame", guess = TRUE)
DataFrame<-as.data.frame(DataFile)

#removing No. and A# from columns
df2<-subset(DataFrame, Group!="No." & Group!="A#")

output:

GID    ColA    ColB 
1       2       2
2       3       4
3       5       4
4       6       5
5       6       5
NG1     8 
MG2     8       1
MG3     8       1
NG4     8

Expected output:

GID    ColA    ColB 
1       2       2
2       3       4
3       5       4
4       6       5
5       6       5
G1     8       N
MG2     8       1
MG3     8       1
G4     8       N

DATA:

df1 <-  structure(list(GID = c("1", "2", "3", "4", "5", "NG1", "MG2", 
"MG3", "NG4"), ColA = c(2L, 3L, 5L, 6L, 6L, 8L, 8L, 8L, 8L), 
    ColB = c("2", "4", "4", "5", "5", "", "1", "1", "")), row.names = c(NA, 
-9L), class = "data.frame")

Check `tidyr::replace_na` for this, unless you provide us with your data we can't help further. Run `dput(DataFrame)` and paste the output here. — Mohan Govindasamy, Feb 04 '21 at 11:20
if it's actually "empty" rather than `NA`, sth like `ifelse(yourdata$ColB == "", "N", yourdata$ColB)` should help — tjebo, Feb 04 '21 at 11:23
@MohanGovindasamy, the request is not for "NA" values, it's for "N" values. — kumar, Feb 04 '21 at 11:24
@kumar as I said earlier if you give your data it will be easier for us to answer — Mohan Govindasamy, Feb 04 '21 at 11:29
check this thread https://stackoverflow.com/questions/14737773/replacing-occurrences-of-a-number-in-multiple-columns-of-data-frame-with-another — tjebo, Feb 04 '21 at 11:30

Ben · Answer 1 · 2021-02-04T15:28:37.643

2

In base R, you could try the following.

First, identify the rows where ColB is an empty character value, and store in a logical vector:

emp_rows <- df1$ColB == ""

Then, remove "N" in GID in those rows:

df1$GID[emp_rows] <- gsub("N", "", df1$GID[emp_rows])

And store "N" in ColB in the same rows:

df1$ColB[emp_rows] <- "N"

To generalize for any column that is blank, you can do the following. Based on the logic in the comment, first check if GID starts with "N". If it does, remove the "N", and then check all columns for blank values, and if blank, substitute with "N".

You can create a function to do this, and then use apply or other method to rowwise go through your data frame.

my_fun <- function(vec) {
  if (startsWith(vec[["GID"]], "N")) {
    vec[["GID"]] <- gsub("N", "", vec[["GID"]])
    vec <- replace(vec, vec == "", "N")
  }
  return(vec)
}

data.frame(t(apply(df1, 1, my_fun)))

Output

  GID ColA ColB
1   1    2    2
2   2    3    4
3   3    5    4
4   4    6    5
5   5    6    5
6  G1    8    N
7 MG2    8    1
8 MG3    8    1
9  G4    8    N

edited Feb 04 '21 at 15:28

answered Feb 04 '21 at 13:34

Ben

28,684
5
23
45

Thank You, but I should not specifically mention the empty row for ColB. If any rows of the Column GID has the value which starts with "N" it should automatically replace the same row with any Column if it is empty – kumar Feb 04 '21 at 14:47
@kumar Please see edited answer, let me know if this is what you had in mind. – Ben Feb 04 '21 at 15:28
1

Like that - @kumar although this is a great solution and shows some nice and clear code using base R only, I would like to point out that what your are getting is character columns. This might intentional, but I get the feeling that it might not be what you actually want. If you want to keep your value as a "number" (an integer), I'd replace empty values with NA and create a separate column with your String – tjebo Feb 04 '21 at 19:24
1

Agree with @tjebo, using NA would likely be a better approach and worth consideration depending on your needs. Once you start adding "N" to "ColA" or "ColB" they will necessarily need to be character and cannot be integer. – Ben Feb 04 '21 at 19:40

Mohan Govindasamy · Answer 2 · 2021-02-05T05:02:44.107

0

By this way you can replace empty char with N or any other character if your choice without mentioning column name

library(tidyverse)

df1 <- structure(list(GID = c("1", "2", "3", "4", "5", "NG1", "MG2", "MG3", "NG4"), ColA = c(2L, 3L, 5L, 6L, 6L, 8L, 8L, 8L, 8L), ColB = c("2", "4", "4", "5", "5", "", "1", "1", "")), row.names = c(NA, -9L), class = "data.frame")

df1 %>% 
  mutate(across(everything(), ~str_replace(., "^$", "N")),
         GID = GID %>% str_remove("N"))
#>   GID ColA ColB
#> 1   1    2    2
#> 2   2    3    4
#> 3   3    5    4
#> 4   4    6    5
#> 5   5    6    5
#> 6  G1    8    N
#> 7 MG2    8    1
#> 8 MG3    8    1
#> 9  G4    8    N

^{Created on 2021-02-05 by the reprex package (v0.3.0)}

edited Feb 05 '21 at 05:02

answered Feb 04 '21 at 11:43

Mohan Govindasamy

856
3
11

Thank you @Mohan Govindasamy. also, I should remove the N value from GID, If the GID starts with N then only I should replace the same into the empty row of a column – kumar Feb 04 '21 at 11:47
I have edited the answer to remove N from the GID column – Mohan Govindasamy Feb 05 '21 at 05:03

how to replace the "N" in the Same Row if any of the columns is empty in R programming

2 Answers2