I have a R dataset of flight data. I need to add 365 columns to this dataset, one for each Day-of-the-year, with value 1 if the data[i]$FlightDate
of the entry corresponds to that Day-of-the-year, 0 otherwise (see this question for why).
Previously I had managed to extract the day of Year from a FlightDate string using lubridate
data$DayOfYear <- yday(ymd(data$FlightDate))
How would I go about generating each 365 columns, and keep only those columns (along with some others) for a future SVD ? I will actually need to repeat the same for the hours in the day (which I will probably split into ranges of 30 or 10 minutes), so an extra 48-120 one-hot columns for a different variable will have to be added later.
Note : my dataset contains about 500k flights per month, (so about 16k flights for a single dayOfTheYear if I just take one year of data), and has 100 variable (columns)
Sample input data row data[1,]
:
{
DayOfYear: 10,
FieldGoodForSvd1 : 235
FieldBadForSvd2 : "some string"
...
}
Sample output data row (after generating 365 binary cols and selecting fields compatible with an SVD)
{
DayOfYear1: 0,
...
DayOfYear9: 0,
DayOfYear10: 1, // The flight had taken place on that DayOfYear
DayOfYear11: 0,
...
DayOfYear365: 0,
FieldGoodForSvd1 : 235
}
EDIT
Suppose my input data matrix looks like that
DayOfYear ; FieldGoodForSvd1 ; FieldBadForSvd2
1 ; 275 ; "los angeles"
1 ; 256 ; "san francisco"
5 ; 15 ; "chicago"
The final output should be
FieldGoodForSvd1 ; DayOfYear1 ; DayOfYear2 ; ... ; DayOfYear4 ; DayOfYear5 ; DayOfYear6 ; ... ; DayOfYear365
275 ; 1 ; 0 ; ... ; 0 ; 0 ; 0 ; ... ; 0
256 ; 1 ; 0 ; ... ; 0 ; 0 ; 0 ; ... ; 0
5 ; 0 ; 0 ; ... ; 0 ; 1 ; 0 ; ... ; 0