Is it possible to convert lines from a text file into columns to get a dataframe?

Question

I have a text file containing information on book title, author name, and country of birth which appear in seperate lines as shown below:

Oscar Wilde
De Profundis    
Ireland 
Nathaniel Hawthorn  
Birthmark   
USA 
James Joyce
Ulysses
Ireland
Walt Whitman
Leaves of Grass 
USA

Is there any way to convert the text to a dataframe with these three items appearing as different columns:

ID  Author                Book               Country
1  "Oscar Wilde"          "De Profundis"     "Ireland"
2  "Nathaniel Hawthorn"   "Birthmark"        "USA"

Thank you for the suggestion. I tried pasting the text as column but it gets pasted as a single line: Oscar Wilde De Profundis Ireland Nathaniel Hawthorn Birthmark USA James Joyce Ulysses Ireland Walt Whitman Leaves of Grass USA — Ghose Bishwajit, Jul 14 '21 at 02:06
Please edit your question so the data can be formatted. Just replace the images. — MrFlick, Jul 14 '21 at 02:07
Thank you so much for editing the question. Could you please give me a hint to how to paste the text the way you did. — Ghose Bishwajit, Jul 14 '21 at 12:33

thelatemail · Answer 1 · 2021-07-15T21:02:01.707

3

There are built-in functions for dealing with this kind of data:

data.frame(scan(text=xx, multi.line=TRUE,
  what=list(Author="", Book="", Country=""), sep="\n"))

#              Author            Book Country
#1        Oscar Wilde    De Profundis Ireland
#2 Nathaniel Hawthorn       Birthmark     USA
#3        James Joyce         Ulysses Ireland
#4       Walt Whitman Leaves of Grass     USA

edited Jul 15 '21 at 21:02

answered Jul 14 '21 at 03:28

thelatemail

91,185
12
128
188

Marvelous! Thank you Latemail, the code is working perfectly. – Ghose Bishwajit Jul 14 '21 at 12:30

MrFlick · Answer 2 · 2021-07-14T16:03:53.347

There aren't any built in functions that handle data like this. But you can reshape your data after importing.

#Test data
xx <- "Oscar Wilde
De Profundis
Ireland
Nathaniel Hawthorn
Birthmark
USA
James Joyce
Ulysses
Ireland
Walt Whitman
Leaves of Grass
USA"
writeLines(xx, "test.txt")

And then the code

library(dplyr)
library(tidyr)
lines <- read.csv("test.txt", header=FALSE)
lines %>% 
  mutate(
    rid = ((row_number()-1) %% 3)+1,
    pid = (row_number()-1) %/%3 + 1) %>% 
  mutate(col=case_when(rid==1~"Author",rid==2~"Book", rid==3~"Country")) %>% 
  select(-rid) %>% 
  pivot_wider(names_from=col, values_from=V1)

Which returns

# A tibble: 4 x 4
    pid Author             Book            Country
  <dbl> <chr>              <chr>           <chr>  
1     1 Oscar Wilde        De Profundis    Ireland
2     2 Nathaniel Hawthorn Birthmark       USA    
3     3 James Joyce        Ulysses         Ireland
4     4 Walt Whitman       Leaves of Grass USA

Hi, thanks for the code! It shows 'there is no package called 'mutate', and also: Error in pivot_wider(., names_from = col, values_from = V1) : could not find function "pivot_wider" — Ghose Bishwajit, Jul 14 '21 at 12:36
Sorry. That was a mistake on my part. It should have been library(tidyr). I’ve updated the post. — MrFlick, Jul 14 '21 at 16:04

score 2 · Answer 3 · answered Jul 14 '21 at 02:52

2

You can create a 3-column matrix from one column of data.

dat <- read.table('data.txt', sep = ',')

result <- matrix(dat$V1, ncol = 3, byrow = TRUE) |>
  data.frame() |>
  setNames(c('Author', 'Book', 'Country'))

result <- cbind(ID = 1:nrow(result), result)

result
#  ID             Author            Book Country
#1  1        Oscar Wilde    De Profundis Ireland
#2  2 Nathaniel Hawthorn       Birthmark     USA
#3  3        James Joyce         Ulysses Ireland
#4  4       Walt Whitman Leaves of Grass     USA

answered Jul 14 '21 at 02:52

Ronak Shah

377,200
20
156
213

Thank you. But it is showing an error, can you please let me know how to solve this: Error: unexpected '>' in "result <- matrix(dat$V1, ncol = 3, byrow = TRUE) |>" – Ghose Bishwajit Jul 14 '21 at 12:31
`|>` is pipe operator included in R 4.1. If you have an older version you can use `result <- setNames(data.frame(matrix(dat$V1, ncol = 3, byrow = TRUE)), c('Author', 'Book', 'Country'))` – Ronak Shah Jul 14 '21 at 12:34
It worked after updating. Thank you so much! I have an additional question. Will this code be good for sentences also, that run for >1 line? – Ghose Bishwajit Jul 14 '21 at 13:20
I am not sure if I understand. This code divides the data from 1 column into 3, 1st 3 values become 1st row, next 3 2nd and so on. – Ronak Shah Jul 14 '21 at 14:02

Is it possible to convert lines from a text file into columns to get a dataframe?

3 Answers3

Linked