0

Say we have a simple Excel file book.xlsx:

  |  A     |  B    |  C   |  D     |  E
1 | Name   | Words |      |        |
2 | Jon    | fish  | cat  | horse  | goose
3 | Alex   | dog   | bird | duck   | koala
4 | Bill   |  dog  | cat  | goose  | deer
5 | George |  cat  | fish | lizard | duck

Below is my code to convert this file in tsv format (adapted from this answer):

install.packages("rio")
install_formats()
library("rio")

xlsx <- "book.xlsx"
created <- mapply(convert, xlsx, gsub("xlsx", "tsv", xlsx))

This creates the following tsv file:

Names   Words   ..3 ..4 ..5
Jon     fish    cat     horse   goose
Alex    dog     bird    duck    koala
Bill    dog     cat     goose   deer
George  cat     fish    lizard  duck

But what I would like to understand how to do is further file manipulation, either as part of the conversion command or on the resulting tsv file: have the Words column values be separated by commas and have the ..3 ..4 ..5 removed like so:

Names   Words
Jon     fish, cat, horse, goose
Alex    dog, bird, duck, koala
Bill    dog, cat, goose, deer
George  cat, fish, lizard, duck

Any suggestion on packages/links or code examples (preferred) to illustrate how to perform such file manipulations as in the above example would be great.

Yannis
  • 1,682
  • 7
  • 27
  • 45
  • I think this question isn't actually about conversion as much as just about collapsing several columns of strings into one column of strings. `tidyr::unite` is one function that will do something like that – camille Feb 24 '19 at 17:58

1 Answers1

1

A couple of things are confusing in your question. It is tagged csv but you state that you want a tab separated file. Also, it wasn't clear that you wanted all the word columns combined into a single column. I have assumed this is what you wanted.

# read file in with readxl package
xlsx <- readxl::read_xlsx("book.xlsx")

# Combine all the word columns into a single column, separated by a comma and a space
xlsx$words2 <- paste(xlsx$Words, xlsx$X__1, xlsx$X__2, xlsx$X__3, sep = ", ")
# select just the name and words column
xlsx <- xlsx[, c("Name", "words2")]
# rename the new column as "Words"
names(xlsx)[2] <- "Words"
# write out, specifying the tab separator
write.table(x = xlsx, file = "out.tsv", row.names = FALSE, sep = "\t")
r.bot
  • 5,309
  • 1
  • 34
  • 45