0

I have a dataframe in R that I pulled from a database of bacterial growth conditions. The dataframe is quite large (~90k rows) and each row corresponds to a bacterial species form said database.

The issue here is that for each row I have a nested list of items. For example:

`

test_data[[1]][["Safety information"]]
  .. .. ..$ DSM-Number    : int 4491
  .. .. ..$ keywords      :List of 4
  .. .. .. ..$ : chr "Bacteria"
  .. .. .. ..$ : chr "16S sequence"
  .. .. .. ..$ : chr "genome sequence"
  .. .. .. ..$ : chr "mesophilic"
  .. .. ..$ description   : chr "Acetobacter lovaniensis DSM 4491 is a mesophilic bacterium that was isolated from soil."
  .. .. ..$ NCBI tax id   :List of 2
  .. .. .. ..$ NCBI tax id   : int 104100
  .. .. .. ..$ Matching level: chr "species"
  .. .. ..$ strain history:List of 2
  .. .. .. ..$ : chr "<- NCIMB <- W. Verhoeven <- J. Frateur"
  .. .. .. ..$ : chr "DSM 4491 <-- NCIMB 8620 <-- W. Verhoeven L 1024 <-- J. Frateur."
  .. .. ..$ doi           : chr "10.13145/bacdive9.20220920.7"
  .. ..$ Name and taxonomic classification                :List of 11
  .. .. ..$ LPSN                :List of 12
  .. .. .. ..$ @ref                : int 20215
  .. .. .. ..$ description         : chr "domain/bacteria"
  .. .. .. ..$ keyword             : chr "phylum/pseudomonadota"
  .. .. .. ..$ domain              : chr "Bacteria"
  .. .. .. ..$ phylum              : chr "Pseudomonadota"
  .. .. .. ..$ class               : chr "Alphaproteobacteria"
  .. .. .. ..$ order               : chr "Rhodospirillales"
  .. .. .. ..$ family              : chr "Acetobacteraceae"
  .. .. .. ..$ genus               : chr "Acetobacter"
  .. .. .. ..$ species             : chr "Acetobacter lovaniensis"
  .. .. .. ..$ full scientific name: chr "<I>Acetobacter</I> <I>lovaniensis</I> (Frateur 1950) Lisdiyanti et al. 2001"
  .. .. .. ..$ synonyms            :List of 2
  .. .. .. .. ..$ :List of 2
  .. .. .. .. .. ..$ @ref   : int 20215
  .. .. .. .. .. ..$ synonym: chr "Acetobacter pasteurianus subsp. lovaniensis"
  .. .. .. .. ..$ :List of 2
  .. .. .. .. .. ..$ @ref   : int 20215
  .. .. .. .. .. ..$ synonym: chr "Acetobacter lovaniense"
  .. .. ..$ @ref                : int 1703
  .. .. ..$ domain              : chr "Bacteria"
  .. .. ..$ phylum              : chr "Proteobacteria"
  .. .. ..$ class               : chr "Alphaproteobacteria"
  .. .. ..$ order               : chr "Rhizobiales"
  .. .. ..$ family              : chr "Acetobacteraceae"
  .. .. ..$ genus               : chr "Acetobacter"
  .. .. ..$ species             : chr "Acetobacter lovaniensis"
  .. .. ..$ full scientific name: chr "Acetobacter lovaniensis (Frateur 1950) Lisdiyanti et al. 2001"
  .. .. ..$ type strain         : chr "yes"
  .. ..$ Morphology                                       : Named list()
  .. ..$ Culture and growth conditions                    :List of 2
  .. .. ..$ culture medium:List of 2
  .. .. .. ..$ :List of 5
  .. .. .. .. ..$ @ref       : int 1703
  .. .. .. .. ..$ name       : chr "YPM MEDIUM (DSMZ Medium 360)"
  .. .. .. .. ..$ growth     : chr "yes"
  .. .. .. .. ..$ link       : chr "https://bacmedia.dsmz.de/medium/360"
  .. .. .. .. ..$ composition: chr "Name: YPM MEDIUM (DSMZ Medium 360)\nComposition:\nMannitol 25.0 g/l\nAgar 12.0 g/l\nYeast extract 5.0 g/l\nPept"| __truncated__
  .. .. .. ..$ :List of 5
  .. .. .. .. ..$ @ref       : int 1703
  .. .. .. .. ..$ name       : chr "GLUCONOBACTER OXYDANS MEDIUM (DSMZ Medium 105)"
  .. .. .. .. ..$ growth     : chr "yes"
  .. .. .. .. ..$ link       : chr "https://bacmedia.dsmz.de/medium/105"
  .. .. .. .. ..$ composition: chr "Name: GLUCONOBACTER OXYDANS MEDIUM (DSMZ Medium 105)\nComposition:\nGlucose 100.0 g/l\nCaCO3 20.0 g/l\nAgar 15."| __truncated__
  .. .. ..$ culture temp  :List of 2
  .. .. .. ..$ :List of 5
  .. .. .. .. ..$ @ref       : int 1703
  .. .. .. .. ..$ growth     : chr "positive"
  .. .. .. .. ..$ type       : chr "growth"
  .. .. .. .. ..$ temperature: chr "28"
  .. .. .. .. ..$ range      : chr "mesophilic"
  .. .. .. ..$ :List of 5
  .. .. .. .. ..$ @ref       : int 67770
  .. .. .. .. ..$ growth     : chr "positive"
  .. .. .. .. ..$ type       : chr "growth"
  .. .. .. .. ..$ temperature: chr "28"
  .. .. .. .. ..$ range      : chr "mesophilic"

`

I would like to essentially 'inflate' the lists to be columns, but I'm unsure how to go about doing this with nested lists and with the lists being nested.

I am unsure where to head. I've tried things from Tidyverse, but doesn't seem to be working. Here is a test sample of the data:

https://github.com/pattyjk/pullIng_bacdive_data/blob/main/test_data.rds

  • 3
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. With nested data it's not clear how you want to translate them to columns. – MrFlick Dec 13 '22 at 16:22
  • Here's sample data: https://github.com/pattyjk/pullIng_bacdive_data/blob/main/test_data.rds – Patrick Kearns Dec 13 '22 at 17:02
  • That may be input, but what exactly do you expect the output to look like? Again, this nested data normally there is no obvious way to translate that to rows/columns. You need to tell us what your desired output is. Also avoid linking to data on external sites. Questions work best if they are self contained as described in the "reproducible example" link I provided. – MrFlick Dec 13 '22 at 18:03
  • I tried directly uploading the data to the post, but I kept getting an error. But I am looking for each '$' in the list to be its own column. For example in the provided peek i'd have a column for each: $ DSM-Number : int 4491 .. .. ..$ keywords :List of 4 .. .. .. ..$ : chr "Bacteria" .. .. .. ..$ : chr "16S sequence" .. .. .. ..$ : chr "genome sequence" .. .. .. ..$ : chr "mesophilic" – Patrick Kearns Dec 13 '22 at 18:51

1 Answers1

0

one approach:

library(tidyr)

## load your example unless already present in workspace:
the_list <- readRDS('path/to/test_data.rds')

## convert nested list to dataframe of list columns (i. e. columns again
## containing lists instead of single values):
df <- as.data.frame(do.call(rbind, the_list))

## further spread list columns as desired. Here, we spread cols 1:3:
df |> unnest_wider(1:3, names_repair = 'universal')

I_O
  • 4,983
  • 2
  • 2
  • 15