0

I'm having a problem where the output number R is giving me is several times higher than the input.

My script is reading a tab-delimited text file with the following data (extract):

4800.000000000004   63.79541685299562
4808.000000000004   65.44888307144669
4816.000000000004   65.66174624010496
4824.000000000004   65.85413227845713
4832.000000000004   66.3271958214957
4840.000000000004   66.67304406065
4848.000000000004   66.90294325983125
4856.000000000004   67.16391462118467
4864.000000000004   67.3649619902818
4872.000000000004   67.47950644400306
4880.000000000004   67.53568545748826
4888.000000000004   67.5820448431992
4896.000000000004   67.70983887523283
4904.000000000004   67.84124194437604
4912.000000000004   67.78234409282649
4920.000000000004   67.17896344097808
4928.000000000004   65.16964351857043

This is labeled as intenistyFile -- it's intensity data from an audio analysis program. The first column is time in milliseconds, the second is intensity in decibels.

From here, I grab all intensity data between two time values (taken from another file in a loop):

    beginTime <- labelFile[i,1] 
    endTime <- labelFile[i,2]
...
#Read intensity file. Grab all pitch measurements >= begin time and <= end time
C <- subset(intensityFile, V1>=beginTime & V1<=endTime)

#Do the following calculations on the F0, stored in the data table
maxIntense <- max(as.numeric(C$V2))
minIntense <- min(as.numeric(C$V2))
rangeIntense <- maxIntense - minIntense 
meanIntense <- mean(as.numeric(C$V2))
stdevIntense <- sd(as.numeric(C$V2))

(I've left out defining "labelFile", which is where I get the time values.)

The problem is that after I do this operations, I get values like this:

maxIntense  minIntense  rangeIntense    meanIntense
23242       19110       4132           21466.66667
24699       19851       4848           23384
22109       16905       5204           20892.28571
25442       13973       11469          20764.46154
26410       16347       10063          23433.18182
25452       13750       11702          20401.63636
27241       9788        17453          23040.41667
23795       19965       3830           22413.5
23528       19584       3944           22074.14286
27530       14302       13228          21571.91667

Which are obviously massively inflated. These are humans speaking, not planet-busting bombs. I have tried using as.double() rather than as.numeric() (I have to force a type, as the intensity for some reason gets read as a factor otherwise). What could be causing this weird inflation?

A note -- I do essentially the same operation to a file indicating pitch values, but no weird inflation (it's also tab-delimited text).

EDIT: Fixed due to first comment by joran comment below. The reason C$V2 was reading as a factor was that each file had a number of values "--undefined--". I manually deleted these before running in R and it worked out. Apparently there is a duplicate, but I won't be needing that.

Cœur
  • 37,241
  • 25
  • 195
  • 267
  • Edit your question to include the output of `str(C)`. My random guess is that something odd about the file is causing the numbers to be read in as a factor. – joran Apr 08 '16 at 16:32
  • Ah, I see buried in there that you _already_ knew it was reading the intensity as factors? Well, in that case this is a duplicate and easily solved. – joran Apr 08 '16 at 16:37

0 Answers0