1

I have a data.frame in R which its columns are named L1, L2, L3, etc. but in a given iteration I am given randomly a data.frame with columns as the following one.

L1,L3,L5
0.0000000,0.7142857,0.2857143
0.1052632,0.8947368,0.0000000
1.0000000,0.0000000,0.0000000
0.0000000,1.0000000,0.0000000
0.0000000,0.0000000,1.0000000
1.0000000,0.0000000,0.0000000

I need a create one with the same number of columns and number with columns name ordered consequently as shown below. The added columns L2, L4, and L6 must be filled with 0.

L1,L2,L3,L4,L5,L6
0.0000000,0.0,0.7142857,0.0,0.2857143,0.0
0.1052632,0.0,0.8947368,0.0,0.0000000,0.0
1.0000000,0.0,0.0000000,0.0,0.0000000,0.0
0.0000000,0.0,1.0000000,0.0,0.0000000,0.0 
0.0000000,0.0,0.0000000,0.0,1.0000000,0.0
1.0000000,0.0,0.0000000,0.0,0.0000000,0.0
Cristian E. Nuno
  • 2,822
  • 2
  • 19
  • 33
LuisMoncayo
  • 125
  • 2
  • 9

2 Answers2

1

With Base R:

# create example data
df <- read.csv(header=T,
        text = "L1,L3,L5
                0.0000000,0.7142857,0.2857143
                0.1052632,0.8947368,0.0000000
                1.0000000,0.0000000,0.0000000
                0.0000000,1.0000000,0.0000000
                0.0000000,0.0000000,1.0000000
                1.0000000,0.0000000,0.0000000")

# create empty dataframe of zeros, with colnames L1:L6
df0 <- as.data.frame(matrix(0, nrow=nrow(df), ncol=6))
names(df0) <- paste0("L", 1:6)

# cbind df with zero cols from df0
df_result <- cbind(df, df0[ , -match(names(df), names(df0))])

# reorder columns L1:L6
df_result <- df_result[ , sort(names(df_result))]

Note that this is effective but inefficient code, as it creates an object full of zeros. This should work well with small to medium-sized data sets, but I would recommend something more clever for large data sets.

DanY
  • 5,920
  • 1
  • 13
  • 33
0

Overview

After reading dplyr - mutate: use dynamic variable names, I tweaked the results to solve your problem of not knowing the column names ahead of time.

Using the , you store the columns that are not found in your existing df and then dynamically add them by way of a for loop.

Code

# load necessary package --------
library(tidyverse)
library(rlang)

# load necessary data -----------
df <-
  read_csv("L1,L3,L5
0.0000000,0.7142857,0.2857143
             0.1052632,0.8947368,0.0000000
             1.0000000,0.0000000,0.0000000
             0.0000000,1.0000000,0.0000000
             0.0000000,0.0000000,1.0000000
             1.0000000,0.0000000,0.0000000") 

# create function that creates one new column ------
FillNewColumns <- function(df, string) {
  require(dplyr)
  require(rlang)

  df %>%
  mutate(!!string := 0 )
}

# store the integers from the column names --------
integer.values <-
  df %>%
  names() %>%
  str_extract("\\d") %>%
  as.integer()

# identify max value from existing integer.values and add 1 ----
max.value <-
  integer.values %>%
  max() + 1

# identify the new columns -------
# note: this requires that you know the maximum value ahead of time
new.columns <-
  (1:max.value %in%
  integer.values == FALSE) %>%
  # take the indices of those TRUE values
  # which do not appear in 1:max.value and create
  # our new columns
  which() %>%
  paste0("L", .)

# dynamically add new columns to df ------
for (i in new.columns) {
  df <- FillNewColumns(df, i)
}

# tidy up the results ------
df <-
  df %>%
  # rearrange the columns in alphabetical order
  select(names(.) %>% sort())

# view results ----
df
# A tibble: 6 x 6
#      L1    L2    L3    L4    L5    L6
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0         0 0.714     0 0.286     0
# 2 0.105     0 0.895     0 0         0
# 3 1         0 0         0 0         0
# 4 0         0 1         0 0         0
# 5 0         0 0         0 1         0
# 6 1         0 0         0 0         0

# end of script #
Cristian E. Nuno
  • 2,822
  • 2
  • 19
  • 33