Assign an ID vector to a dataframe in R, based on filename?

Question

Intro:

I have a directory full of data from a sensor network. I would like to use each sensor's serial number, located within the filename, to create an id vector. .

Here's some example filenames:

2017-07-18-32058-aqdata.csv

2017-07-18-32033-aqdata.csv.

The serial number for each sensor comes after the timestamp, e.g. 32058 or 32033.

Here's how I am currently reading in the data:

## Load the necessary packages:
if (!require(plyr)){
    install.packages('plyr')
    library(plyr)
}

if (!require(dplyr)){
    install.packages('dplyr')
    library(dplyr)
}

## Define a function to read in a single file:
read.data <- function(file_path){
    df <- read.csv(file_path, header=TRUE, stringsAsFactors=FALSE)
    names(df)<-c("datetime", "co", "co2", "VOC", "RH","ugm3","temp")
    df$datetime <- strptime(df$datetime, format="%Y-%m-%d %H:%M")
    df$datetime <- as.POSIXct(df$datetime, format="%Y-%m-%d %H:%M:%S")
    return(df)
}

## Assign object 'file_path' to my target directory:
file_path <-"~/my_directory/"

## Generate a list of files within this directory:
file_list <- list.files(path = file_path, pattern="\\.csv$", all.files=FALSE, full.names=TRUE, ignore.case=FALSE)

## Apply the data.read function to the list of files:
df_master <- dplyr::ldply(file_list, read.data)
df_master <- plyr::arrange(df_master, datetime)

How can I exploit the serial number in each filename to create corresponding ID vectors within my read.data() function?

Here's some example data:

df_example <- structure(list(datetime = structure(c(1497296520, 1497296580, 1497296640, 1497296700, 1497296760, 1497296820), class = c("POSIXct", "POSIXt"), tzone = ""), co = c(0, 0, 0, 0, 0, 0), co2 = c(1118L, 1508L, 836L, 620L, 529L, 498L), VOC = c(62.1353, 59.7594, 59.1831, 57.9592, 56.4335, 53.6528), RH = c(51.45, 52.18, 50.72, 49.71, 49.21, 48.51), ugm3 = c(2.601, 1.061, 1.901, 1.481, 2.501, 3.261), temp = c(72.27, 72.35, 72.45, 72.55, 72.67, 72.77)), .Names = c("datetime", "co", "co2", "VOC", "RH", "ugm3", "temp"), row.names = c(NA, 6L), class = "data.frame")

Thanks in advance!

You pass the name of the file to your `read.data` function. Just parse that string to extra the data you want and add them as columns to your `df` before you return it. So is this question really about how to extract a value from a string? There seems like you could really simplify this question. It's always easier to help with a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) we can just copy/paste into R to run. — MrFlick, Jul 19 '17 at 17:50
Related: https://stackoverflow.com/questions/25167187/merge-multiple-txt-files-from-multiple-directories-in-r — MrFlick, Jul 19 '17 at 17:51
Related: https://stackoverflow.com/questions/2104483/how-to-read-table-multiple-files-into-a-single-table-in-r — MrFlick, Jul 19 '17 at 17:51
Related: https://stackoverflow.com/questions/17055152/extract-part-of-a-file-name-in-r — MrFlick, Jul 19 '17 at 17:52

score 1 · Accepted Answer · answered Jul 19 '17 at 17:54

This assumes your sensor numbers are all 5+ numbers, which helpfully avoids confusion with the dates. Using stringr:

library(stringr)

read.data <- function(file_path){
  df <- read.csv(file_path, header=TRUE, stringsAsFactors=FALSE)
  names(df)<-c("datetime", "co", "co2", "VOC", "RH","ugm3","temp")
  df$datetime <- strptime(df$datetime, format="%Y-%m-%d %H:%M")
  df$datetime <- as.POSIXct(df$datetime, format="%Y-%m-%d %H:%M:%S")

  # New code to pull in sensor number
  df$sensor <- str_extract(file_path, "[0-9]{5,}")
  return(df)
}

Thankfully, the serial numbers are all the same length. This did the trick, thanks for your help! — philiporlando, Jul 19 '17 at 17:57

Assign an ID vector to a dataframe in R, based on filename?

1 Answers1