How to merge output of lapply iterated function into 1 dataframe?

Question

I have a function responses to process some data that is subset into different data frames.

I also have a list of these data frames partylist.

I am trying to iterate through the list of data subsets using lapply and then collect the results in one data frame.

#Function called "response"

response <- function(dat, p){
    y = select(dat, p)
    sums = table(y)
    sums_df = as.data.frame(sums)
    sums_df$rfreq = (sums_df$Freq/sum(sums_df$Freq))*100
    sums_df$rfreq = round(sums_df$rfreq, digits = 0)
    sums_df = sums_df[1:4, c(1,3)]
  return(sums_df)  
}

#Code for iterating through the list of dfs.

lapply(partylist, response, p = "f78a")

#Output:

[[1]]
                    y rfreq
1      Instämmer helt    29
2    Instämmer delvis    40
3  Instämmer knappast     6
4 Instämmer inte alls     2

[[2]]
                    y rfreq
1      Instämmer helt    32
2    Instämmer delvis    38
3  Instämmer knappast     8
4 Instämmer inte alls     2

Can anybody suggest how I would do this?

A similar question was asked here but it never got answered.

If you want to stack the rows, add an id column to each output and use `do.call(rbind, output)`. If you want to add the values together in the second column you could use `Reduce` — Allan Cameron, May 12 '22 at 20:05
Thanks for your fast reply @AllanCameron - I basically want the rfreq as seperate columns of the same dataframe, whilst not repeating the y column. I thought maybe ```cbind``` but that seems to repeat both columns. — Nick Olczak, May 12 '22 at 20:36
The similar question was never answered partly because the request for reproducible data was ignored. Can you provide some details regarding the structure of the data frames and at least a sample of one using `dput()`? It is not clear that splitting the data set in the first place was the best option. — dcarlson, May 12 '22 at 22:25

Abdur Rohman · Accepted Answer · 2022-05-15T11:36:15.520

Your data:

outputs <- list(structure(list(y = c("Instämmer helt", "Instämmer delvis", 
"Instämmer knappast", "Instämmer inte alls"), 
 rfreq = c(29L, 40L, 6L, 2L)), class = "data.frame", 
 row.names = c(NA, -4L)), 
 structure(list(y = c("Instämmer helt", "Instämmer delvis", 
                         "Instämmer knappast", "Instämmer inte alls"), 
 rfreq = c(32L, 38L, 8L, 2L)), class = "data.frame", 
 row.names = c(NA, -4L)))

Reduce can be used for adding the columns, as @Allan Cameron said, and if combined with merge, it can also be used to bind the rfreq columns without repeating y columns.

Reduce(function(df1,df2) merge(df1,df2, by = "y", suffixes = 1:2), outputs)

#                   y rfreq1 rfreq2
#1    Instämmer delvis     40     38
#2      Instämmer helt     29     32
#3 Instämmer inte alls      2      2
#4  Instämmer knappast      6      8

This approach can be applied to a list with more than two elements, but the column names are duplicated. The suffix 3,4,... are not automatically added to the resulted column names.

# Creating two more elements so now `outputs` has four elements
outputs[[3]] <- outputs[[1]]
outputs[[4]] <- outputs[[2]]

# Exactly same code

Reduce(function(df1,df2) merge(df1,df2, by = "y", suffixes = 1:2), outputs) 

# The result:
#                   y rfreq1 rfreq2 rfreq1 rfreq2
#1    Instämmer delvis     40     38     40     38
#2      Instämmer helt     29     32     29     32
#3 Instämmer inte alls      2      2      2      2
#4  Instämmer knappast      6      8      6      8
#Warning message:
#In merge.data.frame(df1, df2, by = "y", suffixes = 1:2) :
#  column names ‘rfreq1’, ‘rfreq2’ are duplicated in the result

Updates

As for why the row order get swapped in the resulted data frame, it is because merge function by default sorts the merged rows lexicographically, as explained in its documentation:

The rows are by default lexicographically sorted on the common columns, but for sort = FALSE are in an unspecified order.

To avoid this default behavior, we can set sort = FALSE

Reduce(function(df1,df2) merge(df1,df2, by = "y", suffixes = 1:2, sort = FALSE), outputs)

#                    y rfreq1 rfreq2 rfreq1 rfreq2
#1      Instämmer helt     29     32     29     32
#2    Instämmer delvis     40     38     40     38
#3  Instämmer knappast      6      8      6      8
#4 Instämmer inte alls      2      2      2      2
#Warning message:
#In merge.data.frame(df1, df2, by = "y", suffixes = 1:2, sort = FALSE) :
#  column names ‘rfreq1’, ‘rfreq2’ are duplicated in the result

Thanks - that works. It took me some time to figure out that the 'df1' and 'df2' here didn't refer to any specific data.frames. Why does the row order get swapped in the the merged results? — Nick Olczak, May 15 '22 at 09:09

How to merge output of lapply iterated function into 1 dataframe?

1 Answers1

Updates