3

I have a text file containing information on book title, author name, and country of birth which appear in seperate lines as shown below:

Oscar Wilde
De Profundis    
Ireland 
Nathaniel Hawthorn  
Birthmark   
USA 
James Joyce
Ulysses
Ireland
Walt Whitman
Leaves of Grass 
USA

Is there any way to convert the text to a dataframe with these three items appearing as different columns:

ID  Author                Book               Country
1  "Oscar Wilde"          "De Profundis"     "Ireland"
2  "Nathaniel Hawthorn"   "Birthmark"        "USA" 
MrFlick
  • 195,160
  • 17
  • 277
  • 295
Ghose Bishwajit
  • 316
  • 4
  • 12
  • Thank you for the suggestion. I tried pasting the text as column but it gets pasted as a single line: Oscar Wilde De Profundis Ireland Nathaniel Hawthorn Birthmark USA James Joyce Ulysses Ireland Walt Whitman Leaves of Grass USA – Ghose Bishwajit Jul 14 '21 at 02:06
  • Please edit your question so the data can be formatted. Just replace the images. – MrFlick Jul 14 '21 at 02:07
  • Thank you so much for editing the question. Could you please give me a hint to how to paste the text the way you did. – Ghose Bishwajit Jul 14 '21 at 12:33

3 Answers3

3

There are built-in functions for dealing with this kind of data:

data.frame(scan(text=xx, multi.line=TRUE,
  what=list(Author="", Book="", Country=""), sep="\n"))

#              Author            Book Country
#1        Oscar Wilde    De Profundis Ireland
#2 Nathaniel Hawthorn       Birthmark     USA
#3        James Joyce         Ulysses Ireland
#4       Walt Whitman Leaves of Grass     USA
thelatemail
  • 91,185
  • 12
  • 128
  • 188
2

There aren't any built in functions that handle data like this. But you can reshape your data after importing.

#Test data
xx <- "Oscar Wilde
De Profundis
Ireland
Nathaniel Hawthorn
Birthmark
USA
James Joyce
Ulysses
Ireland
Walt Whitman
Leaves of Grass
USA"
writeLines(xx, "test.txt")

And then the code

library(dplyr)
library(tidyr)
lines <- read.csv("test.txt", header=FALSE)
lines %>% 
  mutate(
    rid = ((row_number()-1) %% 3)+1,
    pid = (row_number()-1) %/%3 + 1) %>% 
  mutate(col=case_when(rid==1~"Author",rid==2~"Book", rid==3~"Country")) %>% 
  select(-rid) %>% 
  pivot_wider(names_from=col, values_from=V1)

Which returns

# A tibble: 4 x 4
    pid Author             Book            Country
  <dbl> <chr>              <chr>           <chr>  
1     1 Oscar Wilde        De Profundis    Ireland
2     2 Nathaniel Hawthorn Birthmark       USA    
3     3 James Joyce        Ulysses         Ireland
4     4 Walt Whitman       Leaves of Grass USA 
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Hi, thanks for the code! It shows 'there is no package called 'mutate', and also: Error in pivot_wider(., names_from = col, values_from = V1) : could not find function "pivot_wider" – Ghose Bishwajit Jul 14 '21 at 12:36
  • Sorry. That was a mistake on my part. It should have been library(tidyr). I’ve updated the post. – MrFlick Jul 14 '21 at 16:04
2

You can create a 3-column matrix from one column of data.

dat <- read.table('data.txt', sep = ',')

result <- matrix(dat$V1, ncol = 3, byrow = TRUE) |>
  data.frame() |>
  setNames(c('Author', 'Book', 'Country'))

result <- cbind(ID = 1:nrow(result), result)

result
#  ID             Author            Book Country
#1  1        Oscar Wilde    De Profundis Ireland
#2  2 Nathaniel Hawthorn       Birthmark     USA
#3  3        James Joyce         Ulysses Ireland
#4  4       Walt Whitman Leaves of Grass     USA
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you. But it is showing an error, can you please let me know how to solve this: Error: unexpected '>' in "result <- matrix(dat$V1, ncol = 3, byrow = TRUE) |>" – Ghose Bishwajit Jul 14 '21 at 12:31
  • `|>` is pipe operator included in R 4.1. If you have an older version you can use `result <- setNames(data.frame(matrix(dat$V1, ncol = 3, byrow = TRUE)), c('Author', 'Book', 'Country'))` – Ronak Shah Jul 14 '21 at 12:34
  • It worked after updating. Thank you so much! I have an additional question. Will this code be good for sentences also, that run for >1 line? – Ghose Bishwajit Jul 14 '21 at 13:20
  • I am not sure if I understand. This code divides the data from 1 column into 3, 1st 3 values become 1st row, next 3 2nd and so on. – Ronak Shah Jul 14 '21 at 14:02