0

I have a dataset that contains 18 columns and columns are related to an accident being reported. I am trying to find number of accidents that were reported during specific day of the week. For example: total number of accidents reported on Tuesday. For this I am using two variable available in my dataset:Day_of_Week and number_of_vehicle. I am running below sqldf query. However, it is showing me total number of accidents reported during entire week. I would like to use sqldf to report for particular day ex: monday. I would also like to add that there is a fair amount of space in Day_oF_WEEK column.See below for example

Day_OF_Week           Number_of_vehicle
MONDAY(Space here)              50

Some sample dataset

> dput(head(myf,5))
structure(list(CASE_NUMBER = c("1363000002", "1296000023", "1283000016", 
"1282000006", "1267000007"), BARRACK = c("Rockville", "Berlin", 
"Prince Frederick", "Leonardtown", "Essex"), ACC_DATE = c("2012-01-01T00:00:00", 
"2012-01-01T00:00:00", "2012-01-01T00:00:00", "2012-01-01T00:00:00", 
"2012-01-01T00:00:00"), ACC_TIME = c("2:01", "18:01", "7:01", 
"0:01", "1:01"), ACC_TIME_CODE = c("1", "5", "2", "1", "1"), 
    DAY_OF_WEEK = c("SUNDAY   ", "SUNDAY   ", "SUNDAY   ", "SUNDAY   ", 
    "SUNDAY   "), ROAD = c("IS 00495 CAPITAL BELTWAY", "MD 00090 OCEAN CITY EXPWY", 
    "MD 00765 MAIN ST", "MD 00944 MERVELL DEAN RD", "IS 00695 BALTO BELTWAY"
    ), INTERSECT_ROAD = c("IS 00270 EISENHOWER MEMORIAL", "CO 00220 ST MARTINS NECK RD", 
    "CO 00208 DUKE ST", "MD 00235 THREE NOTCH RD", "IS 00083 HARRISBURG EXPWY"
    ), DIST_FROM_INTERSECT = c("0", "0.25", "100", "10", "100"
    ), DIST_DIRECTION = c("U", "W", "S", "E", "S"), CITY_NAME = c("Not Applicable", 
    "Not Applicable", "Not Applicable", "Not Applicable", "Not Applicable"
    ), COUNTY_CODE = c("15", "23", "4", "18", "3"), COUNTY_NAME = c("Montgomery", 
    "Worcester", "Calvert", "St. Marys", "Baltimore"), VEHICLE_COUNT = c("2", 
    "1", "1", "1", "2"), PROP_DEST = c("YES", "YES", "YES", "YES", 
    "YES"), INJURY = c("NO", "NO", "NO", "NO", "NO"), COLLISION_WITH_1 = c("VEH", 
    "FIXED OBJ", "FIXED OBJ", "FIXED OBJ", "VEH"), COLLISION_WITH_2 = c("OTHER-COLLISION", 
    "OTHER-COLLISION", "FIXED OBJ", "OTHER-COLLISION", "OTHER-COLLISION"
    )), .Names = c("CASE_NUMBER", "BARRACK", "ACC_DATE", "ACC_TIME", 
"ACC_TIME_CODE", "DAY_OF_WEEK", "ROAD", "INTERSECT_ROAD", "DIST_FROM_INTERSECT", 
"DIST_DIRECTION", "CITY_NAME", "COUNTY_CODE", "COUNTY_NAME", 
"VEHICLE_COUNT", "PROP_DEST", "INJURY", "COLLISION_WITH_1", "COLLISION_WITH_2"
), row.names = c(NA, 5L), class = "data.frame")

sqldf Code:

sqldf("select sum(Number_of_vehicle),DAY_OF_WEEK from accident group by DAY_OF_WEEK")

Any help is appreciated! Thanks in advance!

Data_is_Power
  • 765
  • 3
  • 12
  • 30
  • Do you mean `sum(Number_of_vehicle)` instead of `count(...)`? It is not clear to me if the problem is that your data is not normalized (`Monday` versus `Monday ` with a space), or if the grouping is not happening like you expect. – r2evans Feb 16 '18 at 00:54
  • @r2evans.. I meant the sum(Number_of_vehicle). My grouping is not happening as I expected – Data_is_Power Feb 16 '18 at 00:59
  • You'll need to show some sample data (with sufficient variability to demonstrate your need); often expected output is also useful, though I suspect it is clear-enough here. Please read [SO/help/mcve](https://stackoverflow.com/help/mcve) for how to ask a good question; another good reference for reproducible questions is https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. – r2evans Feb 16 '18 at 01:00
  • I would like to output the number of vehicle on which accident was reported on Minday – Data_is_Power Feb 16 '18 at 01:02
  • *It is not clear what is wrong.* I can run some sample data in my console and it works just fine, so your SQL syntax does not appear to be flawed. So it is likely either (1) something with your data, or (2) something with your expectations. But because I do not see any of *your* data, all of my conclusions are based on incorrect assumptions about your data based on insufficient information. So, *please post some sample data* (such as `dput(head(x))`), or you're unlikely to get much more than what I've commented here (which does not help much, I realize). – r2evans Feb 16 '18 at 01:06
  • @r2evans.. Thanks for suggesting. I have added snapshot of my dataset above my sqldf code. Please let me know if more information is needed. – Data_is_Power Feb 16 '18 at 01:10
  • Please realize that the primary reason for showing your data is so that we can "play" with it. By posting an image, you are asking me to transcribe it to my console, something I (and perhaps others) am generally unwilling to spend time doing. Even if I were willing, it represents what I *think* your data *actually is* based on its representation in an image; there are several things that can be wrong with the data that will not show in an image, such as the distinction between `character` and `factor`. (Did you look at the links I provided?) – r2evans Feb 16 '18 at 01:12
  • @r2evans..Sorry I completely missed it. I have added more sample from my dataset. Thanks again for pushing me in right direction – Data_is_Power Feb 16 '18 at 01:25
  • Nice start, thanks for posting some data. Looking at your data, everything is `character`, you have no numbers; this isn't crushing you (`sqldf` is correcting for it), but this may not be ideal in the long run. Also, you show `Number_of_vehicle`, but the variable is named `VEHICLE_COUNT`, likely just a typo. Lastly, if I do `accident$DAY_OF_WEEK[2] <- "MONDAY"`, your `sqldf` command works fine, showing two rows: one for Sunday, one for Monday. – r2evans Feb 16 '18 at 01:45
  • @r2evans.. I would like to display both DAY_OF_WEEK and VEHICLE_COUNT (typo from my side) for Monday. So the output will have "MONDAY" row in DAY_OF_WEEK and VEHICLE_COUNT will have number of accidents corresponding to that day – Data_is_Power Feb 16 '18 at 01:58
  • Are you saying if you say `... sum(vehicle_count) as VEHICLE_COUNT, ...`, you'll get what you want? If not, you've lost me (since I get *exactly that* ... one row for each day of the week). – r2evans Feb 16 '18 at 02:04
  • I think I lost you.. I am getting same thing as you are getting i.e one row for each day of the week. However, I just want Sqldf to display me the result of one particular day i.e MONDAY and no other rows. – Data_is_Power Feb 16 '18 at 02:10
  • Then why not use `... where day_of_week like '%MONDAY%' group by ...`, or perhaps subset the results after `sqldf`? – r2evans Feb 16 '18 at 02:43
  • @r2evans..Great Thanks I used where day_of_week like '%MONDAY%'..and it worked. Would it be possible for you to explain why you used %MONDAY% ? Thanks in advance! – Data_is_Power Feb 16 '18 at 02:52
  • It seemed based on your description that there may be extraneous whitespace in the field. You could do `day_of_week='MONDAY'` if you are certain of its content, but this was "safer" given the context. In SQL `where`, `%` is a wildcard, saying *"anything else before or after"*; `foo%` would be *"starting with 'foo'"*. – r2evans Feb 16 '18 at 04:33
  • @r2evans..Thanks for your help and efforts – Data_is_Power Feb 18 '18 at 21:22

0 Answers0