0

I am trying to match entries of data to variables. Consider the example below:

time program_name version
"start_time": "2021-06-25T10:10:15Z" "version": "1.0" "program_name": "MonkeyKeylogger"

I need to match entries to variables for roughly 30,000 observations. Is there any package or code with which to do this in R?

  • Could you give more information on your data? Are the 30.000 observations in a dataset you read in to R, and are they all in the same order, i.e. time, version and program_name? If this is the case, why not switching up the column names, so that it fits using the rename function or colnames()? Then, you could delete the unnecessary parts from your strings, e.g. by using the gsub function from base R. – Maria-Christina Weber Jan 16 '23 at 19:43
  • I should've specified this. Yes, the 30,000 observations are read into R, but they are not in the same order. Originally, I had two variables, the latter of which contained too much data, necessitating that I divide the data into ten sub-variables. However, when I used the "separate" command, I found that I would need to separate the data into many more variables than the ten. I chose ten because the data fit into ten variables, but the data itself in the second original variable contained roughly 256 iterations each of each of the ten sub-variables. – Nicholas Kinberg Jan 16 '23 at 19:48
  • As a result, the data are not in the same order because there aren't exactly 256 entries for each variable. Because of this, I cannot switch column names. – Nicholas Kinberg Jan 16 '23 at 19:49
  • I'm still not sure how the dataset looks like. Could you provide a minimal reproducible example using dput() or data.frame()? Not for all your 30.000 observations, but only so that others can understand your problem better. Have a look here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for how to create a reproducible example for R. – Maria-Christina Weber Jan 17 '23 at 10:31
  • Hello again. I've simplified the problem by removing the timestamp variable (it wasn't useful anyway). We now deal only with the "content" variable. Each of the 112 observations of the "content" variable contains roughly 2,560 observations, divided by 10 sub-variables. My goal now is to restructure the dataset such that I replace the "content" variable with the ten sub-variables. How do I do this? – Nicholas Kinberg Jan 17 '23 at 14:46
  • To clarify, I want to put observations with certain strings under certain variables. Several observations are in the wrong positions and I want to put them in the right positions under the corresponding variables. – Nicholas Kinberg Jan 17 '23 at 14:59
  • Please change your original question accordingly. Add a reproducible dataset and make clear what the output you wish for should look like. I still don‘t understand what you mean by 10 sub variables. – Maria-Christina Weber Jan 17 '23 at 17:34

0 Answers0