0

I have a tibble in R with only one column and ~2Million rows. Each row contains multiple values separated by semicolon. The first row contains header values. I need help to split this tibble into multiple columns.

Below is the present format:

Value
Group Name;Group Level;Member;Member_Type
A;0;XYZ;User
A;0;XX;User
B;1;XXY;Group
B;0;XY;User

Need to convert it into:

Group Name Group Level Member Member_Type
A 0 XYZ User
A 0 XX User
B 1 XXY Group
B 0 XY Group
r2evans
  • 141,215
  • 6
  • 77
  • 149
CT_369
  • 69
  • 6

1 Answers1

1

If you already have it read into R, then you can do

read.table(text = paste(dat$Value, collapse = "\n"), sep = ";", header = TRUE)
#   Group.Name Group.Level Member Member_Type
# 1          A           0    XYZ        User
# 2          A           0     XX        User
# 3          B           1    XXY       Group
# 4          B           0     XY        User

Data:

dat <- structure(list(Value = c("Group Name;Group Level;Member;Member_Type", "A;0;XYZ;User", "A;0;XX;User", "B;1;XXY;Group", "B;0;XY;User")), class = "data.frame", row.names = c(NA, -5L))
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • In the actual data I have 12 headers and when I executed the above code, I got the following error: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 124 did not have 12 elements – CT_369 Dec 01 '21 at 02:23
  • I think this is an issue of under-representative sample data. Perhaps look at rows 123-135 and see what is different about 124 to trigger that error; it sounds like either (a) it has embedded semicolons; (b) it is a malformed row that truly is broken; or (c) your file has multiple tables, and row 124 is the start of a new table with new dimensions (i.e., a multi-table file). Or (d) something else, I don't know. – r2evans Dec 01 '21 at 02:30
  • Thanks, I checked and line 124 seems perfectly normal with 12 values separated by semicolon. Is there a way to check broken rows? – CT_369 Dec 01 '21 at 03:16
  • Did you check line 125 as well? I think it reports 125 as "row 124" when `header=TRUE` (since the 125th line is row 124 after the header line is considered). Otherwise, I don't know for certain. – r2evans Dec 01 '21 at 03:28
  • yeah checked ~10 line above and below line 124 but nothing seems abnormal :( – CT_369 Dec 01 '21 at 03:35