0

I'm new into R programming and I'm looking for suggestions for the next problem.

I have a list that contains a variable number of data frames, to contextualize, let's say we have "n" sensors and all data is contained in a list that contains "n" data frames, where each data frame contains 10 variables that deliver these sensors.


First of all, the data I have comes in a data frame with the following structure:

head(rawData, 3)

  EmailUsuario                                    Fecha Hcho Humidity  Latitude Longitude NombreUsuario Pm25 SensorName Temperature
1         null Fri Feb 01 2019 10:40:51 GMT-0300 (CLST) null     null -34.42584 -72.03271          null   40        C08        null
2         null Fri Feb 01 2019 10:40:56 GMT-0300 (CLST) null     null -34.42584 -72.03271          null   35        C08        null
3         null Fri Feb 01 2019 10:41:01 GMT-0300 (CLST) null     null -34.42642 -72.03216          null   35        C08        null

In this data frame all the sensors are found, in the SensorName column only one is observed, "C08", but there is a variable number of sensors in that column.

The next thing I do is create a list where all the data frames are contained but separated by sensor.

n <- levels(rawData$SensorName) # sensor names vectors
s <- split(x = rawData, f = rawData$SensorName) # List of DFs per sensor

If I look at the second element in the "s" list, I will get the following data frame:

head(s[[2]], 3)


    EmailUsuario                                   Fecha Hcho Humidity Latitude Longitude NombreUsuario Pm25 SensorName Temperature
560         null Wed Feb 06 2019 14:49:17 GMT+0000 (GMT) null     null -70.6667    -33.45          null   12        C17        null
561         null Wed Feb 06 2019 14:49:22 GMT+0000 (GMT) null     null -70.6667    -33.45          null   12        C17        null
562         null Wed Feb 06 2019 14:49:27 GMT+0000 (GMT) null     null -70.6667    -33.45          null    9        C17        null

You can see that it corresponds to the sensor "C17".


Now what I want is, for example, to create a new data frame but only with some variables (columns) of all the data frames contained in the "s" list. In this case I want to create a data frame where the names of the columns are equal to the name of the sensor and the values ​​of the column are equal to the value of the column "Pm25" of that sensor. So I can work with just that data.

Is this approach good? What solution do you recommend? Do you recommend another solution?

Thanks

Wolkuz
  • 83
  • 9
  • It depends on what you want to do with the data. A good tell if this is a good approach would be if you have the same number of records for every sensor. Anyhow you can get to the format with `SensorName` on the columns without the list. Feel free to edit your question for further help – josemz Apr 04 '19 at 23:25
  • @josemz Hi, thanks for your response. No, I do not have the same number of records in each sensor. Is not the creation of the list necessary? Is there an easier way? – Wolkuz Apr 04 '19 at 23:31
  • check the answers [here](https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format), you currently have it in "long" format so you can use either `spread` or `dcast` to put `SensorName` in the columns and `Pm25` as the values. Notice I didn't include `reshape` since it's way too old. Look at the answers below the accepted one. Good luck! – josemz Apr 04 '19 at 23:40
  • @josemz Perfect! It worked ... thank you very much! – Wolkuz Apr 05 '19 at 01:29

0 Answers0