1

So I have a very large .txt file that contains strings and number values with no standard delimiter. It looks like this:

MIO Data Packet:
Event Node:099123910e373b4a9c59114ee9e6d83c
    TrasducerValue:
        Name: Thermometer Digital
        ID: 0
        Raw Value: 138
        Typed Value: 13.800000
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: Thermometer Analog
        ID: 0
        Raw Value: 550
        Typed Value: 13.350000
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: RSSI
        ID: 0
        Raw Value: 12
        Typed Value: 12.000000
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: Ping
        ID: 0
        Raw Value: 0
        Typed Value: 0.000000
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: Motion Sensor
        ID: 0
        Raw Value: 0
        Typed Value: 0.000000
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: Microphone
        ID: 0
        Raw Value: 82
        Typed Value: 82.000000
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: Light Meter
        ID: 0
        Raw Value: 1023
        Typed Value: 0.000000
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: Humidity Sensor
        ID: 0
        Raw Value: 158
        Typed Value: 46.666668
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: Battery Level
        ID: 0
        Raw Value: 267
        Typed Value: 2.670000
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: Barometer
        ID: 0
        Raw Value: 99103
        Typed Value: 99103.000000
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: Accelerometer Z
        ID: 0
        Raw Value: 563
        Typed Value: 0.396364
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: Accelerometer Y
        ID: 0
        Raw Value: 606
        Typed Value: 8.269162
        Timestamp: 2015-03-18T09:22:59.703168-0500
    TrasducerValue:
        Name: Accelerometer X
        ID: 0
        Raw Value: 507
        Typed Value: 1.181309
        Timestamp: 2015-03-18T09:22:59.703168-0500

I have started by using:

library("stringr")
library("plyr")
dat = readLines("03181023.txt")

And I get the feeling the command I need to use is

x = ldply(dat, .fun)

But I am not very knowledgeable about creating functions so am at a bit of a loss when it comes to using the ldply() command properly.

I would like the data to look something like this when I'm done. (With the rest of the values filled in of course)

Name    ID  Raw Value   Typed Value Timestamp
Thermometer Digital 0   138 13.80000    2015-03-18T09:22:59.703168-0500
Thermometer Analog              
RSSI                
Ping                
Motion Sensor               
Microphone              
Light Meter             
Humidity Sensor             

Thanks for any suggestions!

Dan Johnson
  • 77
  • 1
  • 7

1 Answers1

0

I have used information from Extracting decimal numbers from a string and Extracting Data from Text Files in drafting the function below.

txtconvert <- function(file)
{
tmp <- readLines(file) # use readLines to read in the .txt file
tmp <- grep("Name: |ID: |Raw Value: |Typed Value: |Timestamp: ", tmp,
value = TRUE) # search for the column names and retrieve the 
# corresponding value
tmp <- gsub("        ", "", tmp) # remove the spaces at the beginning
tmp <- gsub(": ", "\t", tmp) # substitution to make tmp readable by 
# read.table

# Name
name <- grep("Name", tmp, value = TRUE) # collect all Name values together
name <- read.table(textConnection(name), sep = "\t",
stringsAsFactors = FALSE) # read the lines as a table
names(name)[2] <- "Name" # change the column name
name[1] <- NULL # remove the 1st column

# ID
ID <- grep("ID", tmp, value = TRUE) # collect all ID values together
ID <- read.table(textConnection(ID), sep = "\t", stringsAsFactors = FALSE)
# read the lines as a table
names(ID)[2] <- "ID" # change the column name
ID[1] <- NULL # remove the 1st column

# Raw Value
raw <- grep("Raw Value", tmp, value = TRUE) # collect all Raw Value 
# values together
raw <- read.table(textConnection(raw), sep = "\t", stringsAsFactors = FALSE)
# read the lines as a table
names(raw)[2] <- "Raw Value" # change the column name
raw[1] <- NULL # remove the 1st column

# Typed Value
type <- grep("Typed Value", tmp, value = TRUE) # collect all Typed Value 
# values together
type <- read.table(textConnection(type), sep = "\t", 
stringsAsFactors = FALSE) # read the lines as a table
names(type)[2] <- "Typed Value" # change the column name
type[1] <- NULL # remove the 1st column

# Timestamp
time <- grep("Timestamp", tmp, value = TRUE) # collect all Timestamp 
# values together
time <- read.table(textConnection(time), sep = "\t", 
stringsAsFactors = FALSE)
names(time)[2] <- "Timestamp" # change the column name
time[1] <- NULL # remove the 1st column

tmp <- data.frame(name, ID, raw, type, time) # combine into
# a single data.frame
names(tmp)[3:4] <- c("Raw Value", "Typed Value") # change the column names
return(tmp)
}

This function does not use ldply, but it still provides you with the data.frame that you want


dataout <- txtconvert("data.txt") # data.txt contains all of the data 
# that you provided in your initial question
dataout

Below is dataout

#                Name   ID  Raw Value   Typed Value Timestamp 
# 1 Thermometer Digital 0   138 13.800000   2015-03-18T09:22:59.703168-0500
# 2 Thermometer Analog  0   550 13.350000   2015-03-18T09:22:59.703168-0500
# 3 RSSI    0   12  12.000000   2015-03-18T09:22:59.703168-0500
# 4 Ping    0   0   0.000000    2015-03-18T09:22:59.703168-0500
# 5 Motion Sensor   0   0   0.000000    2015-03-18T09:22:59.703168-0500
# 6 Microphone  0   82  82.000000   2015-03-18T09:22:59.703168-0500
# 7 Light Meter 0   1023    0.000000    2015-03-18T09:22:59.703168-0500
# 8 Humidity Sensor 0   158 46.666668   2015-03-18T09:22:59.703168-0500
# 9 Battery Level   0   267 2.670000    2015-03-18T09:22:59.703168-0500
# 10    Barometer   0   99103   99103.000000    2015-03-18T09:22:59.703168-0500
# 11    Accelerometer Z 0   563 0.396364    2015-03-18T09:22:59.703168-0500
# 12    Accelerometer Y 0   606 8.269162    2015-03-18T09:22:59.703168-0500
# 13    Accelerometer X 0   507 1.181309    2015-03-18T09:22:59.703168-0500


dataout <- structure(list(Name = c("Thermometer Digital", "Thermometer     
Analog", "RSSI", "Ping", "Motion Sensor", "Microphone", "Light Meter", 
"Humidity Sensor", "Battery Level", "Barometer", "Accelerometer Z", 
"Accelerometer Y", "Accelerometer X"), ID = c(0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), `Raw Value` = c(138L, 550L, 
12L, 0L, 0L, 82L, 1023L, 158L, 267L, 99103L, 563L, 606L, 507L), 
`Typed Value` = c(13.8, 13.35, 12, 0, 0, 82, 0, 46.666668, 2.67, 99103,  
0.396364, 8.269162, 1.181309), Timestamp = c("2015-03-18T09:22:59.703168-   
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168-
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168-
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168- 
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168-
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168-
0500", "2015-03-18T09:22:59.703168-0500", "2015-03-18T09:22:59.703168- 
0500")), .Names = c("Name", "ID", "Raw Value", "Typed Value", "Timestamp"
), row.names = c(NA, -13L), class = "data.frame")
Community
  • 1
  • 1
iembry
  • 962
  • 1
  • 7
  • 23
  • Hi @iembry I really appreciate your effort in trying to help me solve this issue. At this point, I have tried to implement your solution, but unfortunately it is not working as gracefully for me as it did for you. So I have a few questions. Did you create a .txt file for use in your solution and then read it into R using ldply() and your txtconvert function? Also, I'm not seeing anything that addresses the "MIO Data Packet:" line, which is a recurring line that indicates a new section of data. Finally, how does this function create the different columns of data? – Dan Johnson Mar 26 '15 at 14:40
  • Hi @Dan Johnson I copied and pasted the data that you shared with us into a .txt file named data.txt. I have edited my answer. Do you want a new data.frame each time that "MIO Data Packet:" appears? Or do you want the requested information in a single data.frame? Can you provide another section of data? I have placed comments through the function with information about what is being done on each line of code. I hope that this helps. Thank you and you're welcome. – iembry Mar 28 '15 at 00:10