Separating list in columns into rows

Question

I am currently separating a dataframe with lists in each column and row. There are 3 columns: jobId (that is unique), skills, skillTypeId

I am hoping to create two new columns that separate those vectors in "skills" and "skillTypeId" and match them respectively. i.e. for example1:

df <- structure(list(job.Id = "A", skill = list(c("microsoft excel", 
"product development")), skillTypeld = list(c(2, 2))), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -1L))

Currently, I managed to separate them by tackling creating a dataframe of "skills" and another of "skillTypeId". For "skills" dataframe, it will contain just jobId and skills. For "skillTypeId" dataframe, it will contain just jobId and skillTypeId. Then I use separate_rows. Eventually, I then use cbind to merge the two data frames together.

However, one problem arise: there were different number of entries (differ by 100+ rows out of the million rows). And I have too much data to troubleshoot which rows went wrong.

I understand that my approach is rather manual, hence I am hoping to get some help in making this less manual, and also most importantly, no missing rows.

Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — Sotos, Jul 22 '20 at 09:31
What do you mean by different number of entries ? Is it that the value of `skill` for a certain `jobID` is of varying length ? — Romain, Jul 22 '20 at 09:40
See https://stackoverflow.com/questions/15347282/split-delimited-strings-in-a-column-and-insert-as-new-rows and https://stackoverflow.com/questions/26194298/unlist-data-frame-column-preserving-information-from-other-column . — Ronak Shah, Jul 22 '20 at 09:55
@Romain yup! After splitting them into 2 data frames, they should still have the same number of entries after separate_rows(). First dataframe with jobId & skill, the second with jobId & skillTypeId. In my example above, "microsoft excel" is of skillTypeId 2. And "product development" is of skillTypeId 2 as well. Each skill belongs to a skillTypeId. In my data, after unlisting each row, length of skill should thus = length of skillTypeId. But somehow it wasn't.. so I'm suspecting separate_rows() had remove some entries which were desirable. — Koh, Jul 24 '20 at 01:44

score 0 · Answer 1 · answered Jul 22 '23 at 06:49

0

unnest_longer(df, c(skill, skillTypeld))

Read the documentation for more information on usage: https://tidyr.tidyverse.org/reference/unnest_longer.html

answered Jul 22 '23 at 06:49

Mark

7,785
2
14
34

Separating list in columns into rows

1 Answers1