-1

Im trying to convert a csv file that is currently in the format,

Name_A,Name_B,Value_A,Value_B,Name_A,Name_B,Value_A,Value_B..... which repeats around 50 times.

When reading the data in R converts the duplicated column names like so

Name_A,Name_B,Value_A,Value_B,Name_A_1,Name_B_1,Value_A_1,Value_B_1

I want it to be in a neat long format instead but can't figure out how. I have tried reshape and tidyr to no avail.

MC101
  • 87
  • 5
  • Possible duplicate of [Reshaping data.frame from wide to long format](http://stackoverflow.com/questions/2185252/reshaping-data-frame-from-wide-to-long-format) – acylam Feb 28 '17 at 18:17
  • 1
    Some inspiration: http://stackoverflow.com/questions/41728439/tidy-data-frame-with-repeated-column-names – Jaap Feb 28 '17 at 18:19
  • What is the desired output? Long or pretty (duplicated) column names or both? – Roman Luštrik Feb 28 '17 at 19:20

1 Answers1

0

The OP didn't specify the expected result, so we have to guess wildly what the expected result should look like. Two possibilities are considered:

Case 1

Name_A, Value_A,
Name_B, Value_B,
Name_A, Value_A,
Name_B, Value_B,
...

Case 2

Name_A, Name_B, Value_A, Value_B
Name_A, Name_B, Value_A, Value_B
Name_A, Name_B, Value_A, Value_B
...

Both cases can be handled by melt() from the data.table package.

Case 1

library(data.table)
# read csv file
wide <- fread("wide.csv")
wide
#   Name_A Name_B Value_A Value_B Name_A Name_B Value_A Value_B
#1:     A1     B1     1.1     1.2     A2     B2     2.1     2.2
#2:     C1     D1     3.1     3.2     C2     D2     4.1     4.2

# reshape
long <- melt(wide, measure.vars = patterns("^Name_", "^Value_"), 
             value.name = c("Name", "Value"), variable.name = "col_group")
long
#   col_group Name Value
#1:         1   A1   1.1
#2:         1   C1   3.1
#3:         2   B1   1.2
#4:         2   D1   3.2
#5:         3   A2   2.1
#6:         3   C2   4.1
#7:         4   B2   2.2
#8:         4   D2   4.2

Note that the repetitions of the name-value-pairs in each row are counted in col_group.

Input data

Since the OP didn't provide a sample data file, a data file named "wide.csv" was created with the following contents:

Name_A, Name_B, Value_A, Value_B, Name_A, Name_B, Value_A, Value_B
A1, B1, 1.1, 1.2, A2, B2, 2.1, 2.2
C1, D1, 3.1, 3.2, C2, D2, 4.1, 4.2

Adding row numbers

Row numbers can be added in two different ways.

The first variant adds row numbers to wide before melting:

long <- melt(wide[, rn := seq_len(.N)], measure.vars = patterns("^Name_", "^Value_"), 
             value.name = c("Name", "Value"), variable.name = "col_group")
long[order(rn)]
#   rn col_group Name Value
#1:  1         1   A1   1.1
#2:  1         2   B1   1.2
#3:  1         3   A2   2.1
#4:  1         4   B2   2.2
#5:  2         1   C1   3.1
#6:  2         2   D1   3.2
#7:  2         3   C2   4.1
#8:  2         4   D2   4.2

The second variant adds row numbers to long after melting using the rowid() function:

long <- melt(wide, measure.vars = patterns("^Name_", "^Value_"), 
             value.name = c("Name", "Value"), variable.name = "col_group")
long[, rn := rowid(col_group)]
long[order(rn)]
#   col_group Name Value rn
#1:         1   A1   1.1  1
#2:         2   B1   1.2  1
#3:         3   A2   2.1  1
#4:         4   B2   2.2  1
#5:         1   C1   3.1  2
#6:         2   D1   3.2  2
#7:         3   C2   4.1  2
#8:         4   D2   4.2  2

Case 2

cols <- c("Name_A", "Name_B", "Value_A", "Value_B")
long <- melt(wide[, rn := seq_len(.N)], measure.vars = patterns(cols), 
             value.name = cols, variable.name = "col_group")
long[order(rn)]
#   rn col_group Name_A Name_B Value_A Value_B
#1:  1         1     A1     B1     1.1     1.2
#2:  1         2     A2     B2     2.1     2.2
#3:  2         1     C1     D1     3.1     3.2
#4:  2         2     C2     D2     4.1     4.2

Here, col_group counts the repetitions of quadruples in each row.

Uwe
  • 41,420
  • 11
  • 90
  • 134