Say we have a simple Excel file book.xlsx
:
| A | B | C | D | E
1 | Name | Words | | |
2 | Jon | fish | cat | horse | goose
3 | Alex | dog | bird | duck | koala
4 | Bill | dog | cat | goose | deer
5 | George | cat | fish | lizard | duck
Below is my code to convert this file in tsv format (adapted from this answer):
install.packages("rio")
install_formats()
library("rio")
xlsx <- "book.xlsx"
created <- mapply(convert, xlsx, gsub("xlsx", "tsv", xlsx))
This creates the following tsv file:
Names Words ..3 ..4 ..5
Jon fish cat horse goose
Alex dog bird duck koala
Bill dog cat goose deer
George cat fish lizard duck
But what I would like to understand how to do is further file manipulation, either as part of the conversion command or on the resulting tsv file: have the Words
column values be separated by commas and have the ..3 ..4 ..5
removed like so:
Names Words
Jon fish, cat, horse, goose
Alex dog, bird, duck, koala
Bill dog, cat, goose, deer
George cat, fish, lizard, duck
Any suggestion on packages/links or code examples (preferred) to illustrate how to perform such file manipulations as in the above example would be great.