2

I have a very large dataframe of 63 columns and 1697 rows. The end of the rows fill up with NAs but I want the matching values in rows to be in the same column, and stick the NAs into the gaps

a bit like this (updated):

v1 <- c("v1","v1","v1","v1","v1","v1","v1")
v2 <- c("v3","v2","v2","NA","v2","v2","v2")
v3 <- c("v4","v4","v3","NA","v3","v3", "v3")
v4 <- c("v5","v5","v4","NA","v5","v4","NA")
v5 <- c("NA","NA","v5","NA","v6","v6", "NA")
v6 <- c("NA","NA","v6","NA","v7","v7","NA")
v7 < - c("NA","NA","NA","NA","NA","NA","NA")
df <- data.frame(v1,v2,v3,v4,v5,v6,v7)

df

  v1 v2 v3 v4 v5 v6 v7
1 v1 v3 v4 v5 NA NA NA
2 v1 v2 v4 v5 NA NA NA
3 v1 v2 v3 v4 v5 v6 NA
4 v1 NA NA NA NA NA NA
5 v1 v2 v3 v5 v6 v7 NA
6 v1 v2 v3 v4 v6 v7 NA
7 v1 v2 v3 NA NA NA NA

but I would like everything aligned like this:

  v1 v2 v3 v4 v5 v6 v7
1 v1 NA NA v4 v5 NA NA
2 v1 v2 NA v4 v5 NA NA
3 v1 v2 v3 v4 v5 v6 NA
4 v1 NA NA NA NA NA NA
5 v1 v2 v3 NA v5 v6 v7 
6 v1 v2 v3 v4 NA v6 v7 
7 v1 v2 v3 NA NA NA NA

I have tried map.values() and this didn't come out as expected, as well as a ifelse() but this all requires me to enter specific cell data and change that.

The column names do match the cell names.

I want to use the data to put into a presence absence plot, so I figured after I can just

for (i in 1:63){
gsub("NA", 0, df[,i]}

and then same for anything containing "v" to have a binary 1 or 0 for presence or absence, but they have to be aligned

There are no predefined rules governing the data, the dataframe has been conglomerated together from many other .csv files and this is the best format I can get it into currently.

Any help would be appreciated!

  • Hello @Charity_case, it would help us greatly if you could share your data in a friendlier format such as using `dput(your_data)` or any means documented here : [how-to-make-a-great-r-reproducible-example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – cbo Oct 28 '19 at 14:32
  • 2
    It would be helpful to have reproducible data and a better specification. E.g. Do the column names mirror the desired value, as in your example? Is there a pre-defined rule governig the order of the values? What do you want to do with the data, next? Maybe there is another route you could go... – Phil Oct 28 '19 at 14:33

1 Answers1

1

Updated answer to match new input data

Data

I removed the quotation marks from NA:

v1 <- c("v1","v1","v1","v1","v1","v1","v1")
v2 <- c("v3","v2","v2",NA,"v2","v2","v2")
v3 <- c("v4","v4","v3",NA,"v3","v3", "v3")
v4 <- c("v5","v5","v4",NA,"v5","v4",NA)
v5 <- c(NA,NA,"v5",NA,"v6","v6", NA)
v6 <- c(NA,NA,"v6",NA,"v7","v7",NA)
v7 <- c(NA,NA,NA,NA,NA,NA,NA)
df <- data.frame(v1,v2,v3,v4,v5,v6,v7, stringsAsFactors = F)

Code

l <- list()
u <- c("v1", "v2", "v3", "v4", "v5", "v6", "v7")
h <- NULL
for(k in 1:nrow(df)){
  # create a list for each row of the df
  l[[k]] <- df[k, ]
  for(i in 1:length(l[[k]])){
    #check if number exists in the row
    if(u[i] %in% l[[k]]){
      # find the index of the number given it exists
      a <- which(l[[k]] == u[i])
      #assign to "help" vector in order to not overwrite values 
      h[i] <- l[[k]][a]
    }
    else{
      #numbers that do not exist in the vector are asigned NA
      h[i] <- NA
    }
  }
  #replace row by sorted vector with NA place holders ("help" vector)
  l[[k]] <- h
}

Result

df1 <- as.data.frame(do.call(rbind, l))
df1
  V1 V2 V3 V4 V5 V6 V7
1 v1 NA v3 v4 v5 NA NA
2 v1 v2 NA v4 v5 NA NA
3 v1 v2 v3 v4 v5 v6 NA
4 v1 NA NA NA NA NA NA
5 v1 v2 v3 NA v5 v6 v7
6 v1 v2 v3 v4 NA v6 v7
7 v1 v2 v3 NA NA NA NA
fabla
  • 1,806
  • 1
  • 8
  • 20
  • Thanks for your answer lots of help! However, I'm not sure what happened to the V4s. So 4 V4s went into the dataframe and none came out when binding. – Charity_case Oct 29 '19 at 13:50
  • Ah I got it, it's just because in the list (u) "V4" was entered instead of "v4". So, yes this definitely answers my query, thank you! – Charity_case Oct 29 '19 at 14:03