0

guys ,i have a question since i input my excel sheet as data frame my data is shown as below

arr=as.data.frame(rbindlist(df,fill=T))
> arr
    ST Code Emp          Employee              Pay.Code Hours     Gross
1   AL 7229  65                 S                HOURLY  0.00      0.00
2   AL 7229  65                 S                SALARY  0.00   3060.00
3   AL 7229  65                 S              PER DIEM  0.00    765.00
4   AL 7229  65                 S EXPENSE REIMBURSEMENT  0.00     11.00
5   AL 7229   2                 R                HOURLY 60.00    720.00
6   CA   42   2                 R              OVERTIME  3.25     58.50
7   CA   42   3                 A                HOURLY 80.00    800.00
8   CA   42   3                 A              OVERTIME  6.25     93.75
9   CA   42   4                 N                HOURLY 79.25    990.63
10  CA   42   4                 N              OVERTIME  7.00    131.25
11  CA   42   9                 P                HOURLY 32.00    352.00
12  CA   42   9                 P              OVERTIME  1.75     28.88
13  CA   42  10                 E                HOURLY 72.00    864.00
14  CA   42  10                 E              OVERTIME  5.00     90.00
15 AL  7229  NA              <NA>                  <NA>    NA  23954.00
16 AL  7229  NA              <NA>                  <NA>    NA  11092.10
17  GA 7380  NA              <NA>                  <NA>    NA  98142.00
18  GA 8380  NA              <NA>                  <NA>    NA  11984.00
19  NC 7380  NA              <NA>                  <NA>    NA 218129.00
20  NC 8380  NA              <NA>                  <NA>    NA  27891.00
21  TN 7380  NA              <NA>                  <NA>    NA  28441.00
22  TN 8380  NA              <NA>                  <NA>    NA   8348.00

i use

outcome=sqldf("select * from arr where ST = 'AL'")
outcome

always get

 ST Code Emp          Employee              Pay.Code Hours Gross
1 AL 7229  65                S                HOURLY     0     0
2 AL 7229  65                S                SALARY     0  3060
3 AL 7229  65                S              PER DIEM     0   765
4 AL 7229  65                S EXPENSE REIMBURSEMENT     0    11
5 AL 7229   2                R                HOURLY    60   720

i always cannot catch 15 & 16 where did it go wrong ?

Rorita_Tai
  • 33
  • 4
  • 1
    It appears to be that they are `"AL "` (note the space). Either fix your data (preferable) or make a more flexible sql statement: `select * from arr where ST like 'AL%'` (untested). – r2evans Oct 23 '17 at 23:03
  • thanks r2evans it wworks!! – Rorita_Tai Oct 23 '17 at 23:22

1 Answers1

0

When I know the wrong pattern I usually try to normalize all the data before.

In your case, remove the blank spaces will solve the problem.

arr$ST = strsplit(arr$ST," ")[[1]] arr$ST = arr$ST[arr$ST!=""]

But if you don't know the wrong pattern you can use Regex, i'm not sure if sqldf is able to regex pattern (I never used sqldf before, sorry), but if is you can use the regex pattern from SQL language.

Ps.:

1 - You can filter the data without sqldf using arr[which(arr$ST == "AL"),]

2 - Don't forget to save your data before any changes.

  • 1
    The question is not whether `sqldf` does regex, it's whether SQL does regex. And it depends on the db-engine being used by `sqldf` at the moment. See https://stackoverflow.com/a/33029814/3358272. (The default use of SQLite means no regex is available.) – r2evans Oct 23 '17 at 23:45