1

I have a data table like below. All columns are in characters.

Table:

V29  V30  V31  V32  V33  V34 V35 V36 V37 V38 .... V69
044  N    005  E    026  044 N   006 E   011 

I want to paste them in 5 column groups starting from V29. For example I want to obtain an output column in Table as shown below.

Table:
V29  V30  V31  V32  V33  V34 V35 V36 V37 V38 .... V69   Output
044  N    005  E    026  044 N   006 E   011            044N005E026-044N006E011-

How can I achieve this in R. Any help is appreciated.

Thanks.

NUdu
  • 173
  • 6
  • Do you want your output to be a table or a character string, separating the groups of five with "-"? – OTStats Dec 07 '18 at 15:57
  • I want to add a column "Output" as character string to the same "Table". I prefer data table method. – NUdu Dec 07 '18 at 16:03
  • It's a minor point, NUdu, but especially when you have non-standard data presentation (numbers here are really `character`), it really helps to have an unambiguous form of the input data. For this, it would suffice to do something like `dput(x[1:2, 1:10])` and paste the output into the question. That way, (1) we know exactly what everything is (even `factor`s, typically hidden as currently copied), and (2) we can use that *verbatim* with no risk of what happened in my first version (dropping leading zeroes). One of many refs for reproducibility: https://stackoverflow.com/questions/5963269 – r2evans Dec 07 '18 at 16:14

2 Answers2

1

Expanding your data a little bit:

x <- read.table(stringsAsFactors=FALSE, header=TRUE, as.is=TRUE, colClasses="character", text="
V29  V30  V31  V32  V33  V34 V35 V36 V37 V38    V29a V30a V31a V32a V33a V34a V35a V36a V37a V38a
044  N    005  E    026  044 N   006 E   011    044  N    005  E    026  044  N    006  E    011 
044  N    005  E    026  044 N   006 E   011    044  N    005  E    026  044  N    006  E    011 ")

The answer:

sapply(split.default(x, (seq_len(ncol(x))-1) %/% 5),
       function(s) paste(apply(s, 1, paste0, collapse = ""), collapse = "-"))
#                         0                         1                         2 
# "044N005E026-044N005E026" "044N006E011-044N006E011" "044N005E026-044N005E026" 
#                         3 
# "044N006E011-044N006E011" 

This can easily be assigned to a column of the same frame.

Explanation:

  • to break a frame up by 5 columns, split comes to mind, but the default use of split(...) will use split.data.frame which splits by row, not column, so we use split.default (which works by column). From there, you can see how we're grouping things:

    (seq_len(ncol(x))-1) %/% 5
    #  [1] 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
    
  • For each of these groups, we get a 5-column frame:

    split.default(x, (seq_len(ncol(x))-1) %/% 5)
    # $`0`
    #   V29 V30 V31 V32 V33
    # 1  44   N   5   E  26
    # 2  44   N   5   E  26
    # $`1`
    #   V34 V35 V36 V37 V38
    # 1  44   N   6   E  11
    # 2  44   N   6   E  11
    ### truncated for brevity
    

    So we use sapply to do something to each of these frames, returning it (in this case) simplified. (If we specify simplify=FALSE or if not all of them are the same length, then it will be returned unsimplified, as a list instead of a vector).

  • The function we apply to each frame is apply(., 1, paste0, collapse0) which will return a vector of the 5-column pastes, something like:

    apply(s, 1, paste0, collapse = "")
    # $`0`
    # [1] ""044N005E026" "044N005E026""
    

    Because we want them combined, we surround it as paste(apply(...), collapse = "-").

r2evans
  • 141,215
  • 6
  • 77
  • 149
  • However this method doesn't preserve the leading zeros. – OTStats Dec 07 '18 at 16:02
  • 1
    @OTStats that's all on the data import - `read.table` is dropping the 0s, not any of the processing. If OP already has data imported as strings with leading 0s, this method will work just fine. If OP shared that data in a friendlier format, we wouldn't see the issue at all. – Gregor Thomas Dec 07 '18 at 16:04
  • It will, that's an artifact of me having to make up my data. – r2evans Dec 07 '18 at 16:04
  • (Though if you add `colClasses = "character"`, I think it will solve it...) – Gregor Thomas Dec 07 '18 at 16:05
  • @r2evans: Thanks for your reply. But, I want to paste 5 columns as a group and separate by dash (-). Please see my Output column above. – NUdu Dec 07 '18 at 16:06
  • NUdu, see my edit (and the previous comments explaining the missing leading-zeroes as a moot point), the strings now match. – r2evans Dec 07 '18 at 16:08
1

Using DF defined in the Note at the end create a sprintf formatting string fmt and then run it.

If there are NA's in DF then they will appear in the output as the string "NA". If you prefer to omit them completely then replace them with the empty string in DF before running the code below, i.e. run DF[is.na(DF)] <- "" first.

fmt <- paste(rep(strrep("%s", 5), ncol(DF)/5), collapse = "-") # %s%s%s%s%s-%s%s%s%s%s
Output <- do.call("sprintf", c(fmt, DF))
data.frame(DF, Output, stringsAsFactors = FALSE)

giving:

  V29 V30 V31 V32 V33 V34 V35 V36 V37 V38                  Output
1 044   N 005   E 026 044   N 006   E 011 044N005E026-044N006E011

or using DF2 from Note in place of DF we get:

  V29 V30 V31 V32 V33 V34 V35 V36 V37 V38                  Output
1 044   N 005   E 026 044   N 006   E 011 044N005E026-044N006E011
2 045   S 006   F 027 045   S 007   F 012 045S006F027-045S007F012

data.table

If, as per comment, you want to use data.table then use this (with fmt from above):

library(data.table)

DT <- data.table(DF)
DT[, Output:=do.call("sprintf", c(fmt, .SD))]

Note

Lines <- "
  V29  V30  V31  V32  V33  V34 V35 V36 V37 V38 
  044  N    005  E    026  044 N   006 E   011 "
DF <- read.table(text = Lines, header = TRUE, colClasses = "character")

Lines2 <- "
  V29 V30 V31 V32 V33 V34 V35 V36 V37 V38
1 044   N 005   E 026 044   N 006   E 011
2 045   S 006   F 027 045   S 007   F 012"
DF2 <- read.table(text = Lines2, header = TRUE, colClasses = "character")
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • You should probably include the definition of `Paste0` from the linked question for completeness. – r2evans Dec 07 '18 at 16:09
  • @Grothendieck: Thanks for your reply. I tried your method but I am getting the paste column as: 044-N-005-E-026-044-N-006-......etc. Which is not what I want. – NUdu Dec 07 '18 at 16:15
  • I added Paste) function and it worked. Thanks a bunch. – NUdu Dec 07 '18 at 16:19
  • Is there a way I can modify this if I have multiple rows similar to above table.? – NUdu Dec 07 '18 at 17:02
  • @G.Grothendieck: What if the DF2 likes below: Lines2 <- " V29 V30 V31 V32 V33 V34 V35 V36 V37 V38 1 044 N 005 E 026 044 N 006 E 011 2 045 S 006 F 027 NA NA NA NA NA" DF2 <- read.table(text = Lines2, header = TRUE, colClasses = "character") – NUdu Dec 07 '18 at 20:36
  • @G.Grothendieck: I don't like to see the "NA" in my DF2. dput(DF2) shows: structure(list(V29 = c("044", "045"), V30 = c("N", "S"), V31 = c("005", "006"), V32 = c("E", "F"), V33 = c("026", "027"), V34 = c("044", NA), V35 = c("N", NA), V36 = c("006", NA), V37 = c("E", NA), V38 = c("011", NA)), class = "data.frame", row.names = c("1", "2")) – NUdu Dec 07 '18 at 22:06
  • Have transferred all my comments to the answer. – G. Grothendieck Dec 07 '18 at 23:06