1

I have a df in which several features describe one word. For further analysis I need all features below each other, including the described word. I guess an example will make it easier to understand:

df <- data.frame(a = c(1:4), feature1 = c("word1", "word2", "word3", "word4"), feature2 = c("a", "b", "c", "d"), inputword = c("this", "that", "this2", "that2"))
df

  a feature1 feature2 inputword
1 1    word1        a      this
2 2    word2        b      that
3 3    word3        c     this2
4 4    word4        d     that2

Now I would need every feature in one column including the inputword info and a. Expected output:

  a feature2 inputword
1 1    word1      this
2 2    word2      that
3 3    word3     this2
4 4    word4     that2
5 1        a      this
6 2        b      that
7 3        c     this2
8 4        d     that2

I found a way to get to my expected output by creating single dataframe and then rbind.data.frame them. However in the original data set I have up to 18 features, so creating single data frames like below seems very inefficient.

The way I made it worked:

df1 <- df[ , -c(3)]
df2 <- df[ , -c(2)]
colnames(df1)=colnames(df2)
df_all <- rbind.data.frame(df1, df2)
df_all

  a feature2 inputword
1 1    word1      this
2 2    word2      that
3 3    word3     this2
4 4    word4     that2
5 1        a      this
6 2        b      that
7 3        c     this2
8 4        d     that2

Maybe someone can help me with a more efficient way to get what I want?

Thank you in advance!

Linda Espey
  • 145
  • 5

4 Answers4

0

Here is a data.table option using melt

> melt(setDT(df), id.var = c("a", "inputword"), value.name = "feature")[, .(a, feature, inputword)]
   a feature inputword
1: 1   word1      this
2: 2   word2      that
3: 3   word3     this2
4: 4   word4     that2
5: 1       a      this
6: 2       b      that
7: 3       c     this2
8: 4       d     that2
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

Using pivot_longer:

library(tidyverse)

df <- data.frame(a = c(1:4), 
                 feature1 = c("word1", "word2", "word3", "word4"), 
                 feature2 = c("a", "b", "c", "d"), 
                 inputword = c("this", "that", "this2", "that2")
)
df %>% 
  pivot_longer(cols = feature1:feature2,
               values_to = "feature2") %>% 
  select(a, feature2, inputword, -name)
#> # A tibble: 8 x 3
#>       a feature2 inputword
#>   <int> <chr>    <chr>    
#> 1     1 word1    this     
#> 2     1 a        this     
#> 3     2 word2    that     
#> 4     2 b        that     
#> 5     3 word3    this2    
#> 6     3 c        this2    
#> 7     4 word4    that2    
#> 8     4 d        that2

Created on 2021-05-03 by the reprex package (v2.0.0)

Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
Desmond
  • 1,047
  • 7
  • 14
  • Thank you! However, when I run the code, it says that "trace data is not square". Further, I am not able to load tidyverse, saying "there is no package called tidyverse". – Linda Espey May 03 '21 at 09:28
  • Sounds like you don't have it installed. Try ```install.packages("tidyverse")```, then try running the above code again. – Desmond May 03 '21 at 09:43
0

You can also try,

library(tidyr)
data_long <- gather(df, condition, measurement, feature1:feature2, factor_key=TRUE)
data_long

and get

  data_long
  a inputword condition measurement
1 1      this  feature1       word1
2 2      that  feature1       word2
3 3     this2  feature1       word3
4 4     that2  feature1       word4
5 1      this  feature2           a
6 2      that  feature2           b
7 3     this2  feature2           c
8 4     that2  feature2           d

to see only feature2 values

my.new.df <- data_long[data_long$condition == "feature2", ]

answer

    a inputword condition measurement
5 1      this  feature2           a
6 2      that  feature2           b
7 3     this2  feature2           c
8 4     that2  feature2           d
Seyma Kalay
  • 2,037
  • 10
  • 22
0

In base you can use data.frame with autorepeat to create the desired output:

with(df, data.frame(a, future = c(feature1, feature2), inputword))
#  a future inputword
#1 1  word1      this
#2 2  word2      that
#3 3  word3     this2
#4 4  word4     that2
#5 1      a      this
#6 2      b      that
#7 3      c     this2
#8 4      d     that2

or using names to select.

data.frame(df["a"], future = unlist(df[,startsWith(names(df), "feature")]),
 df["inputword"])
#          a future inputword
#feature11 1  word1      this
#feature12 2  word2      that
#feature13 3  word3     this2
#feature14 4  word4     that2
#feature21 1      a      this
#feature22 2      b      that
#feature23 3      c     this2
#feature24 4      d     that2
GKi
  • 37,245
  • 2
  • 26
  • 48