Update file name in execution into data.table

Question

A quite straightforward question.

library(data.table)
dt <- fread("HK0001.csv", drop = 5:24)

the data like this:

                   Time Price Volume  Amount Flag
 1: 2016-01-04 09:05:06 105.0   9500  993700    1
 2: 2016-01-04 09:20:00 104.1  23500 2446350    0
 3: 2016-01-04 09:30:00 104.1  18500 1924550    1
 4: 2016-01-04 09:30:01 103.9  12500 1300550    0
 5: 2016-01-04 09:30:02 104.1  16118 1675233    1
 6: 2016-01-04 09:30:05 104.0  13000 1352200    0
 7: 2016-01-04 09:30:06 104.1   2500  260100    1
 8: 2016-01-04 09:30:07 104.1   1500  156150    1
 9: 2016-01-04 09:30:08 104.3    500   52150    1
10: 2016-01-04 09:30:10 104.0   1000  104000    0
11: 2016-01-04 09:30:11 103.9   1000  103900    0
12: 2016-01-04 09:30:15 104.0   3500  364450    1
13: 2016-01-04 09:30:17 104.3   2000  208450    1
14: 2016-01-04 09:30:19 104.3   1500  156450    1
15: 2016-01-04 09:30:20 104.4    500   52200    1
16: 2016-01-04 09:30:21 104.4   1500  156600    1
17: 2016-01-04 09:30:22 104.4   1000  104400    1
18: 2016-01-04 09:30:24 104.4   1500  156600    1
19: 2016-01-04 09:30:25 104.0   2000  208000    0
20: 2016-01-04 09:30:27 104.1   3500  364350    1

Under the same folder, there are a lot such csv files whose name is also a stock code ticker. As shown in the example in the fread line, "0001" is a ticker name, there are "0002", "0003" and one thousand more others.

The question:

I would like to update the ticker in the file name into data.table after fread by adding one new column.

Now the columns are Time, Price, Volume, Amount, Flag. I wish to put the one column named stock ticker as the first column.

I checked the following on SO which did not address the question.

Rscript: Determine path of the executing script

getting the name of a dataframe from loading a .rda file in R

Thanks a lot!

For reference, here's a "Stackoverflow Doc" that says pretty much the same thing as rosscova's answer: http://stackoverflow.com/documentation/data.table/4456/using-list-columns-to-store-data/15561/reading-in-many-related-files#t=201701281415067989851 — Frank, Jan 28 '17 at 14:15

rosscova · Accepted Answer · 2017-01-28T11:09:32.510

3

How about adding from a substr after import?

file <- "HK0001.csv"
dt <- fread(file, drop = 5:24)
dt[ , stockTicker := substr( file, 3, 6 ) ]

You could also turn this into a function...

read_in <- function( file ) {
    dt <- fread(file, drop = 5:24)

    dt[ , stockTicker := substr( file, 3, 6 ) ]

    return( dt )
}

Then call that function on a list of files, binding them together as one big data table object (since data from each file is now identified):

file.list <- list.files()

DT <- lapply( file.list, read_in )
DT <- rbindlist( DT )

edited Jan 28 '17 at 11:09

answered Jan 28 '17 at 10:57

rosscova

5,430
1
22
35

I think you might need an `idcol=` in the `rbindlist` ..? – Frank Jan 28 '17 at 14:14
1

@Frank I'm doing that in the `read_in` function by creating the `stockTicker` column. Do you think it needs `idcol` as well? – rosscova Jan 28 '17 at 21:05
No worries @Frank, using `idcol` might be a better option here? – rosscova Jan 28 '17 at 21:19
I think this way is fine. I usually end up using idcol since I handle it a little differently (as shown in the link I gave in a comment on the OP). – Frank Jan 29 '17 at 05:29
Thanks both for the kind instruction! – Bigchao Jan 29 '17 at 05:50

Update file name in execution into data.table

1 Answers1