-5

so I have a data set of bike rentals in Washington D.C. Some of my variables are factors and some are numerics and continuous. I couldn't find a way to upload the dataset, therefore I hope the next explanation will be enough: I want to explain the "count"(which is numeric and continuous) of rentals of bike with the climate. I want to merge the follow variables into one which will be called agg_climate:

- season(factor) - 1 = Winter, 2 = Summer, 3 = Spring, 4 = Fall
 - weather(factor) - 1 = Good, 2 = Normal, 3 = Bad
 - temp(continuous) - measured in degrees
 - atemp(continuous) - measured in degrees
 - windspeed(continuous) - measured in mp/h
 - humidity(continuous) - measured in %

    datetime season     holiday  workingday weather  temp  atemp humidity windspeed count hour
3201 2011-09-15 17:00:00 Summer Regular day Working day     Bad 19.68 23.485       82   31.0009   261   17
377  2011-02-02 05:00:00 Winter Regular day Working day     Bad  9.02 12.120       93    7.0015     3    5
6103 2012-06-01 21:00:00 Spring Regular day Working day     Bad 26.24 29.545       78   16.9979    85   21
           daytime
3201    After Noon
377  Early Morning
6103       Evening

a pic of the data table: https://ibb.co/SnphvBt

What is the proper way to do so? Thanks!

Lahav
  • 1
  • 2
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Maybe you want some sort of `interaction()`? But it's not exactly clear to me what output you are expecting. – MrFlick Dec 13 '18 at 16:18
  • To upload a sample of your data try `dput(head(MyData, 20))` and paste the results into your question – G5W Dec 13 '18 at 16:20
  • If your dataset was named `bikes` you could type `head(bikes)` and get a small example of your data to share with us. This, plus an example of the end result you would like to achieve, would allow us to help you. – Jared C Dec 13 '18 at 16:22
  • Hey, I tried to post a sample of the data, I hope it's more clear now. I would like to create a new numeric variable the aggregate the affection of all weather related variables: season, weather, temp, atemp, humidity and windspeed. – Lahav Dec 13 '18 at 17:01
  • Also shared a link with a sample of the table – Lahav Dec 13 '18 at 17:18
  • Its still very unclear to me exactly what values you want `agg_climate` to take. How exactly do you want it to be calculated? Stack Overflow is for specific programming questions. If you want general data modeling advice, you should ask over at [stats.se] – MrFlick Dec 13 '18 at 17:51

1 Answers1

0

You can merge several weather related measurements into one named apparent temperature.

The AT index ... is based on a mathematical model of an adult, walking outdoors, in the shade (Steadman 1994). The AT is defined as; the temperature, at the reference humidity level, producing the same amount of discomfort as that experienced under the current ambient temperature and humidity.

Please see below how it can be implemented in your case:

x <- structure(list(datetime = structure(c(2L, 1L, 3L), .Label = c("05:00:00", 
"17:00:00", "21:00:00"), class = "factor"), season = structure(c(2L, 
3L, 1L), .Label = c("Spring", "Summer", "Winter"), class = "factor"), 
    holiday = c("Regular day", "Regular day", "Regular day"), 
    workingday = c("Working day", "Working day", "Working day"
    ), weather = structure(c(1L, 3L, 2L), .Label = c("Bad", "Good", 
    "Normal"), class = "factor"), temp = c(19.68, 9.02, 26.24
    ), atemp = c(23.485, 12.12, 29.545), humidity = c(82L, 93L, 
    78L), windspeed = c(31.0009, 7.0015, 16.9979), count = c(261L, 
    3L, 85L), hour = c(17L, 5L, 21L), daytime = c("After Noon", 
    "Early Morning", "Evening")), row.names = c("2011-09-15", 
"2011-02-02", "2012-06-01"), class = "data.frame")

x$e <- x$humidity / 100 * 6.105 * exp(17.27 * x$temp / (237.7 + x$temp)) # vapor pressure
x$windspeed_ms <- 0.4470400 * x$windspeed # windspeed in m/s
x$AT <- x$temp + 0.33 * x$e - 0.7 * x$windspeed_ms - 4.00 # apparent temperature
x[, c("temp",  "humidity", "windspeed", "AT")]

Output

            temp humidity windspeed        AT
2011-09-15 19.68       82   31.0009 12.166304
2011-02-02  9.02       93    7.0015  6.351849
2012-06-01 26.24       78   16.9979 25.669603

As for your other variables, they are related to seasonality and it is better to use:

  • time series analysis with exogenous variables;
  • machine learning (like random forest regressions, recurrent neural networks etc.);
  • multivariate (non)linear regression.
Artem
  • 3,304
  • 3
  • 18
  • 41