replace_na used on a data frame with columns of repeated names

Question

I am trying to replace NA values within the columns of a data frame, but since some columns have identical names the function dplyr::replace_na replaces the NAs only for the first occurrence of each column name.

library(dplyr)
library(stringr)

namesvec1<-c("John Smith","John Smith Jr df", "Luis Rivera","Ricardo Feliciano  ADE","Huber Villa Gomez 12","Christian Pilares","Luis Rivera","Luis Rivera","Christian Pilares") 
namesvec<-c("John Smith", "Ricardo Feliciano","Christian Pilares","Luis Rivera","John Smith Jr")
namesvec<-sort(namesvec,decreasing = T)
namesfun<-(sapply(namesvec1, function (x)(str_extract(x,sapply(namesvec, function (y)y)))))%>%as.data.frame(stringsAsFactors = F)

mylist<-list()
for(i in 1:ncol(namesfun)){
  mylist[i]<-"zzz"

}
names(mylist)<-names(namesfun)
replace_na(namesfun,mylist)

The result i get is this:

enter image description here

Am I doing something wrong?

Then you should add `library(dplyr)` (same for any other package that the code relies on) to the example, and add the `dplyr` tag to the existing `r` one. — AkselA, Jul 20 '18 at 15:26
You will have to excuse me. I'm new to both R and stackoverflow — El Mexicano, Jul 20 '18 at 15:42
No problem, we all have to start somewhere (usually at the beginning). Give [this thread](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) a read, it contains a lot of useful R tricks, particularly for posting on SO. — AkselA, Jul 20 '18 at 15:48
Why do you have duplicate column names? Generally when I have what might end up being duplicate column names, it's a sign that my data is shaped poorly and should be long instead of wide — camille, Jul 20 '18 at 17:40

Aurèle · Answer 1 · 2018-07-20T16:31:19.753

2

One should never, ever, build data frames with duplicate column names. This is a source of horrendous bugs.

(Apologies for the strong language, but this is an absolute rule that suffers no exception).

Replace as.data.frame with data.frame (that uses make.names(unique = TRUE) internally to guarantee unicity of column names, as long as we keep the default check.names = TRUE).

The rest of the code will then work as expected.

(Or, possibly, come up with another data frame "shape" or data structure that is better suited to your needs, but this is hard to guess from the question alone).

edited Jul 20 '18 at 16:31

answered Jul 20 '18 at 16:23

Aurèle

12,545
1
31
49

So, my columns have repeated names, because I took a column from an existing DF and the rows of it (which had duplicated elements) ended up being my columns (of the new DF). I am relatively new to R so probably this could be avoided. – El Mexicano Jul 23 '18 at 14:27

score 0 · Accepted Answer · answered Jul 20 '18 at 15:21

0

I use purrr with replace_na to do what you want, when I have a common replacement value for NA.

library(tidyverse) #for tidyr/purrr/dplyr

map_df(namesfun, ~replace_na(.x, "zzz"))

# A tibble: 5 x 6
  `John Smith` `John Smith Jr df` `Luis Rivera` `Ricardo Feliciano  ADE` `Huber Villa Gomez 12` `Christian Pilares`
  <chr>        <chr>              <chr>         <chr>                    <chr>                  <chr>              
1 zzz          zzz                zzz           Ricardo Feliciano        zzz                    zzz                
2 zzz          zzz                Luis Rivera   zzz                      zzz                    zzz                
3 zzz          John Smith Jr      zzz           zzz                      zzz                    zzz                
4 John Smith   John Smith         zzz           zzz                      zzz                    zzz                
5 zzz          zzz                zzz           zzz                      zzz                    Christian Pilares

answered Jul 20 '18 at 15:21

Jake Kaupp

7,892
2
26
36

I am quite new to R so this might be a stupid question, but why do i get this error when i try to load tidyverse? Error: package or namespace load failed for ‘tidyverse’: .onAttach failed in attachNamespace() for 'tidyverse', details: call: namespaceExport(ns, exports) error: undefined exports: %+%, bgBlack, bgBlue, bgCyan, bgGreen, bgMagenta, bgRed, bgWhite, bgYellow, black, blue, blurred, bold, chr, col_align, col_nchar, col_strsplit, col_substr, col_substring, combine_styles, cyan, drop_style, finish, green, has_color, has_style, hidden............................ – El Mexicano Jul 20 '18 at 15:38
Sounds like an installation error. Did `install.packages("tidyverse")` complete without error? If it did error, try restarting your R session and installing it again. – Jake Kaupp Jul 20 '18 at 15:52
So, the tidyverse won't install really.I restarted both R and my computer but I keep getting the same error. – El Mexicano Jul 23 '18 at 08:20
Ok.So i updated R, without keeping my existing packages,and I installed langr, and tidyverse from the CRAN and everything works fine. Thanks a lot btw, the map_df is very useful! – El Mexicano Jul 23 '18 at 09:07

replace_na used on a data frame with columns of repeated names

2 Answers2