Convert input data set

Question

I am working on requirement where the input data is in below format.

Name XYZ AGE 30 Country India Mobile 1234567890
Name ABC AGE 35 Country Russia Mobile 2345678901

I want to import this data into R & want to reshape it . i.e. "Name" "AGE" "Country" "Mobile" should be the column header .

How is the data stored? is it in a text file? How are the fields delimited? — Robin Gertenbach, May 06 '16 at 08:19
Welcome to Stack Overflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — zx8754, May 06 '16 at 08:35

Sotos · Answer 1 · 2016-05-06T08:57:27.630

How about you create a data frame first with the values and then add the names as follows,

x <- c('Name XYZ AGE 30 Country India Mobile 1234567890',
           'Name ABC AGE 35 Country Russia Mobile 2345678901')

df <- as.data.frame(do.call(rbind, lapply(strsplit(x, ' '), function(i) i[c(FALSE, TRUE)])))
names(df) <- unlist(strsplit(x[1], ' '))[c(TRUE, FALSE)]
df
#  Name AGE Country     Mobile
#1  XYZ  30   India 1234567890
#2  ABC  35  Russia 2345678901

RHertel · Answer 2 · 2016-05-06T08:58:03.160

Assuming that the data is stored in a data.frame df1

df1 <- read.table(text="Name XYZ AGE 30 Country India Mobile 1234567890
                        Name ABC AGE 35 Country Russia Mobile 2345678901")

You could create a new data.frame df2 by selecting every second (even-numbered) column

df2 <- df1[c(FALSE,TRUE)]

and assign the column names by using every second (odd-numbered) entry in the first row of df1:

colnames(df2) <- unlist(df1[1, c(TRUE, FALSE)])

The data.frame df1 can then be deleted with rm(df1). This is the result for df2:

#> df2
#  Name AGE Country     Mobile
#1  XYZ  30   India 1234567890
#2  ABC  35  Russia 2345678901

The same procedure could be written as a one-liner. Arguably less clear, but certainly more compact:

df1 <- `colnames<-`(df1[c(FALSE,TRUE)], unlist(df1[1,c(TRUE,FALSE)]))

In that case the second data.frame df2 is not needed.

Worked for me..Thanks.. Trying the other suggestions also..will update here with the results.. — puneet, May 06 '16 at 09:37

score 0 · Answer 3 · answered May 06 '16 at 08:25

0

A combination of matrix and unlist should do the trick. Like

tidyData <- data.frame(matrix(unlist(dataByLine), nrow=length(fileByLines), byrow=T),stringsAsFactors=F))

If you had a minimum reproducible example, this would be easier to answer

answered May 06 '16 at 08:25

mondano

827
10
29

Convert input data set

3 Answers3