Normalizing(Reshaping) data frame based on split and columns

Question

I have a data frame containing

df 
Date        name             score
12/09/2012  Mahesh\nRahul    120
13/09/2012  abc\nxyz\nrep         110
...........................

i have tried this in order to get atomic

name1=str_split(df[,2],"\n")

but dont know how to associate again ,what is the best way to make data frame normalize so that i can get

 df 
Date        name     score
12/09/2012  Mahesh   120
12/09/2012  Rahul    120
13/09/2012  abc      110
13/09/2012  xyz      110
13/09/2012  rep      110
...........................

any help to make normalized a long data frame in R.

Edit

please note that it is just a reproducible example ,i have multiple names in my name column and number of names varies from one row to other row .thanks.

dput(df) structure(list(Date = structure(1:2, .Label = c("12/09/2012", "13/09/2012 "), class = "factor"), name = structure(c(2L, 1L), .Label = c("abc\nxyz", "Mahesh\nRahul"), class = "factor"), score = structure(c(2L, 1L), .Label = c("110", "120"), class = "factor")), .Names = c("Date", "name", "score"), row.names = c(NA, -2L), class = "data.frame")

Please check this [link](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). A good reproducible example will help others to tackle your question lot more easily. — CHP, Oct 29 '13 at 11:37
Please give output of `dput(df)` so that we can reproduce `df` on our side — CHP, Oct 29 '13 at 11:39
You are confusing the meaning of atomicity. To really get a `data.frame` to be an atomic vector just use `unlist( data.frame )`, but this will not get you the output you desire. This is a question about reshaping your data. — Simon O'Hanlon, Oct 29 '13 at 11:43
@geektrader dput(df) structure(list(Date = structure(1:2, .Label = c("12/09/2012", "13/09/2012 "), class = "factor"), name = structure(c(2L, 1L), .Label = c("abc\nxyz", "Mahesh\nRahul"), class = "factor"), score = structure(c(2L, 1L), .Label = c("110", "120"), class = "factor")), .Names = c("Date", "name", "score"), row.names = c(NA, -2L), class = "data.frame") — Aashu, Oct 29 '13 at 11:52
next time i will definalty keep in mind how to properly write a reproducible exam,thanks. — Aashu, Oct 29 '13 at 11:56

score 5 · Answer 1 · edited Jun 20 '20 at 09:12

5

Here's an R base solution

Update

> Names <- strsplit(df$name, "\n")
> n <- sapply(Names, length)
> data.frame(cbind(apply(df[,-2], 2, function(x) rep(x, n)), 
                   name=unlist(Names)), row.names = NULL)[,c(1,3,2)]
        Date   name score
1 12/09/2012 Mahesh   120
2 12/09/2012  Rahul   120
3 13/09/2012    abc   110
4 13/09/2012    xyz   110
5 13/09/2012    rep   110

where df is:

> dput(df)
structure(list(Date = c("12/09/2012", "13/09/2012"), name = c("Mahesh\nRahul", 
"abc\nxyz\nrep"), score = c(120, 110)), .Names = c("Date", "name", 
"score"), row.names = c(NA, -2L), class = "data.frame")

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 29 '13 at 11:49

Jilber Urbina

58,147
10
114
138

+1 I agree with Matthew. Nothing ugly about this. Quite a practical solution. I think it does assume that the splits for all names are equal though right? Or am I misinterpreting? – Simon O'Hanlon Oct 29 '13 at 11:56
thanks @MatthewPlourde and SimonO101, will delete the ugly adjective. – Jilber Urbina Oct 29 '13 at 11:57
note, both of these ways assume there are only two names per row. – Matthew Plourde Oct 29 '13 at 12:43
1

Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 1302, 47 after R base sol dat1 <- cbind(dat1[,-6], To=c(do.call(rbind, strsplit(dat1$To, "\n")))) – Aashu Oct 29 '13 at 13:08
@Jilber, you'll need to work some more on this to make it work with "unbalanced" data in the "name" column. – A5C1D2H2I1M1N2O1R2T1 Oct 29 '13 at 14:25
@Aashu, rather than just posting the error messages like you're doing, try to read them and understand what they are saying, then post a more meaningful comment. – A5C1D2H2I1M1N2O1R2T1 Oct 29 '13 at 14:26
@Aashu see my update, it works for any number of names now. I used your new data.frame ;). Ananda thanks for the comment – Jilber Urbina Oct 29 '13 at 14:47

score 3 · Accepted Answer · answered Oct 29 '13 at 11:55

3

This is relatively easy using data.table (and fast obviously).

require( data.table )
dt <- data.table( df )
dt[ , list( name = unlist( strsplit( name , "\n" ) ) ) , by = list( Date , score ) ]
#         Date score   name
#1: 12/09/2012   120 Mahesh
#2: 12/09/2012   120  Rahul
#3: 13/09/2012   110    abc
#4: 13/09/2012   110    xyz

As a note I took df to be the following data (note character classes over factor classes that appear in your actual data...

df <- read.delim( text = "Date    name    score
12/09/2012  'Mahesh\nRahul'   120
13/09/2012  'abc\nxyz'       110" ,
sep = "" , h = TRUE , quote = "\'" , stringsAsFactors = FALSE )

answered Oct 29 '13 at 11:55

Simon O'Hanlon

58,647
14
142
184

@Aashu did you read my note about how my data are different to yours? Specifically the fact you have `factor` in your data. Your comment is not very helpful without what you are getting. The code works on the input data as I show it. – Simon O'Hanlon Oct 29 '13 at 13:28
df=read.csv("abc.csv", header = TRUE,quote = "\'" , check.names = TRUE,stringsAsFactors = FALSE ) ,hope it satisfy your condition. – Aashu Oct 29 '13 at 13:30
@Aashu I don't think you need `quote` (I am not sure - I added it to make the data read correctly when I copy/pasted it in from your OP). I recommend you read your data in your normal way and *just add* the `stringsAsFactors = FALSE`. – Simon O'Hanlon Oct 29 '13 at 13:32

score 2 · Answer 3 · answered Oct 29 '13 at 13:20

To add to the alternatives, you can use scan to separate the string quite easily and rep and cbind to get the final data.frame

df
#         Date          name score
# 1 12/09/2012 Mahesh\nRahul   120
# 2 13/09/2012 abc\nxyz\nrep   110

scan(text=as.character(df$name), what = "")
# Read 5 items
# [1] "Mahesh" "Rahul"  "abc"    "xyz"    "rep"  

cbind(df[rep(rownames(df), 
             sapply(gregexpr("\n", df$name), length)+1), 
         c("Date", "score")], 
      name = scan(text=as.character(df$name), what = ""))
#           Date score   name
# 1   12/09/2012   120 Mahesh
# 1.1 12/09/2012   120  Rahul
# 2   13/09/2012   110    abc
# 2.1 13/09/2012   110    xyz
# 2.2 13/09/2012   110    rep

read.table also works to split the concatenated column:

read.table(text = as.character(df$name), sep = "\n", header = FALSE)
#       V1
# 1 Mahesh
# 2  Rahul
# 3    abc
# 4    xyz
# 5    rep

Matthew Plourde · Answer 4 · 2013-10-29T12:51:56.813

1

There are already good answers here, but this is another way with base R using the rle and inverse.rle functions. @Jilber's base R solution is more elegant, but this way will work if your strings have more than two names.

df <- read.table(text='Date        name             score
12/09/2012  "Mahesh\nRahul"    120
13/09/2012  "abc\nxyz\nrep"         110', header=TRUE, stringsAsFactors=FALSE)

ns <- strsplit(df$name, '\n')
result <- lapply(lapply(lapply(df, rle), `[[<-`, 'lengths', sapply(ns, length)), inverse.rle)
transform(data.frame(result), name=unlist(ns))
#         Date   name score
# 1 12/09/2012 Mahesh   120
# 2 12/09/2012  Rahul   120
# 3 13/09/2012    abc   110
# 4 13/09/2012    xyz   110
# 5 13/09/2012    rep   110

edited Oct 29 '13 at 12:51

answered Oct 29 '13 at 12:15

Matthew Plourde

43,932
7
96
113

+1 just for `readClipboard()` alone! What a handy function I never knew about. – Simon O'Hanlon Oct 29 '13 at 12:18
@SimonO101 thanks, I removed it though. That way fails to read the newlines correctly. – Matthew Plourde Oct 29 '13 at 12:38
Can you post the output of `str(df)` in your posting, or better yet, the ouput of `dput(df[1:3, ])`? – Matthew Plourde Oct 29 '13 at 13:35

Normalizing(Reshaping) data frame based on split and columns

4 Answers4

Update