How to normalize this data in R?

Question

I have a text file and it has multiple interations of data as below. I need to extract selective data from each iteration and put it into a tibble/dataframe. I think I'll have to create some sort of a function that would be able to scan the data and place it into a relevant column. I have no idea how to do it and I have never dealt with this kind of data format. How can I do it?

Source file format
==============
Col1: X
objectClass: top
objectClass: Role
Col2: X1
Col2: X2
User: UserX
User: userY
User: UserZ
cn: P1
description: Permissions
Host: HRef456

Col1: Y
objectClass: top
objectClass: Role
Col2: Y3
Col2: Y4
Col2: Y5
User: U1
User: U2
cn: P2
description: Permissions
Host: HRef123

What I need:
===========

I *think* `read.dcf` should do it - try `read.dcf("filename.txt", all=TRUE)` — thelatemail, Aug 11 '20 at 01:50
Thanks, but I think I would need some sort of a function for this. — CT_369, Aug 11 '20 at 02:17
`read.dcf` is a function. It imports the data into a data.frame. Your data is also uneven, which is going to cause issues here I think. E.g. the first block has 2 `Col2` entries and 3 `User` entries while the reverse is true of the second block. — thelatemail, Aug 11 '20 at 02:17
Yeah, wondering how to handle this dynamically. I'm not an expert with R functions. :( — CT_369, Aug 11 '20 at 02:24
It's not really possible to represent uneven data in a rectangular dataset. I.e. - if there are 3 `Col2`'s in a single block and 2 `User`s, which one goes with which? The `read.dcf` function I mentioned will import the data into a dataset, but you'll have embedded lists, and not a clean format like your requested output. — thelatemail, Aug 11 '20 at 03:20
a combination of read.dcf and [this](https://stackoverflow.com/questions/13773770/split-comma-separated-strings-in-a-column-into-separate-rows) — rawr, Aug 11 '20 at 03:34
CT_369, I think you need to update your expected output to account for the uneven data that @thelatemail mentioned. For example, `as.data.frame(read.dcf("63350291.dcf", all=TRUE))` "works", but it produces `structure(list(Col1="X",objectClass=list(c("top","Role")),Col2=list(c("X1","X2")),User=list(c("UserX","userY","UserZ")),cn="P1",description="Permissions",Host="HRef456"),row.names=1L,class="data.frame")` (1 row), which cannot "recycle" cleanly. — r2evans, Aug 11 '20 at 03:36
@rawr - they're not comma-separated strings, they're embedded `list()` objects. — thelatemail, Aug 11 '20 at 03:44
@thelatemail failing to see why that is a problem `x[] <- lapply(x, function(y) toString(unlist(y)))` — rawr, Aug 11 '20 at 03:59
@rawr - not a problem, just that there'd be another step: `read.dcf`, conversion to strings, then split comma-separated. `tidyr::unnest` or something similar might be more direct. — thelatemail, Aug 11 '20 at 04:02
@rawr When I execute the below code to first load the data, it errors out. as.data.frame(read.dcf("sample.txt", all=TRUE))
Error in readLines(file, skipNul = TRUE) : cannot open the connection In addition: Warning message: In readLines(file, skipNul = TRUE) : cannot open compressed file '63350291.txt', probable reason 'No such file or directory' — CT_369, Aug 11 '20 at 07:45
@CT_369 - `'No such file or directory'` - your text file doesn't exist or you're working in a different directory to where the file is saved. — thelatemail, Aug 12 '20 at 01:51
@thelatemail This is magical. Thank you so much. When I load the actual data, I end up having multiple comma separated values in the same cell. How do I address that? I mean want them in different columns so that I can then unnest all the values and have a single column. — CT_369, Aug 12 '20 at 06:42

How to normalize this data in R?

0 Answers0