0

I got a dataset named full and one of its Column is Breed as shown below.

Breed

 Shetland Sheepdog Mix
 Domestic Shorthair Mix
 Pit Bull Mix
 Domestic Shorthair Mix
 Lhasa Apso/Miniature Poodle
 Cairn Terrier/Chihuahua Shorthair
 Domestic Shorthair Mix
 Domestic Shorthair Mix
 American Pit Bull Terrier Mix
 Cairn Terrier
 Domestic Shorthair Mix
 Miniature Schnauzer Mix
 Pit Bull Mix
 Yorkshire Terrier Mix
 Great Pyrenees Mix
 Domestic Shorthair Mix
 Domestic Shorthair Mix
 Pit Bull Mix
 Angora Mix
 Flat Coat Retriever Mix
 Queensland Heeler Mix
 Domestic Shorthair Mix
 Plott Hound/Boxer

What I required is,

I need to get the frequency for each unique value in the column.

I have extracted the BreedType and the frequency as shown below. (The breed column is given the name as BreedType ) Then if the frequency of each BreedType is less than 66, using an if condition I need to have a new column with 'F' and if greater than 66 need to assign the column with the value of 'Breedtype'.

Assign FALSE for Breed values where Breed frequency is less than 66.

 df$Breed <- data.frame(full$Breed)

 setDT(df)
 dt1 <- copy(df)   

 dt1[, c("Frequency", "TrueFalse") := .(.N, ifelse(.N < 66, "FALSE", Breed)), by = Breed]

 dt1<-data.frame(dt1)

But my result set gets the answer set like this with the shown error.

enter image description here

Error in [.data.table(dt1, , :=(c("Frequency", "TrueFalse"), .(.N, : Type of RHS ('integer') must match LHS ('character'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1)

I tried several times but I was not able to get the result looked. Can someone please help

Also when the full$Breed is used again the result set is looking like this. and not what expected but the frequency is giving correctly,

 df$Breed <- data.frame(full$Breed)

 setDT(df)
 dt1 <- copy(df)   

 dt1[, c("Frequency", "TrueFalse") := .(.N, ifelse(.N < 66, "FALSE", full$Breed)), by = full$Breed]

 dt1<-data.frame(dt1)

 Full<-cbind2(dt1, full)

enter image description here

Can someone please help to figureout what the issue is!

user3789200
  • 1,166
  • 2
  • 25
  • 45
  • Did you try `dt1[, c("Frequency", "TrueFalse") := .(.N, ifelse(.N < 66, FALSE, Breed)), by = Breed]` (omitting the quotation marks around `FALSE`)? – Jaap Jul 30 '16 at 10:28
  • Yes when it is tested, giving the same error, Error in `[.data.table`(dt1, , `:=`(c("Frequency", "TrueFalse"), .(.N, : Type of RHS ('logical') must match LHS ('character'). To check and coerce would impact performance too much for the fastest cases. Either change the type of the target column, or coerce the RHS of := yourself (e.g. by using 1L instead of 1) – user3789200 Jul 30 '16 at 10:31
  • Tested on the example data, the code **with** the quotation marks works on my PC. – Jaap Jul 30 '16 at 10:37
  • Oh is there anyway of sharing an excel here? – user3789200 Jul 30 '16 at 10:45
  • See: [How to give a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) – Jaap Jul 30 '16 at 10:46
  • can you download the dataset given under the following link and use the train dataset? – user3789200 Jul 30 '16 at 11:17
  • Can you download the train dataset and use the same and try to get the result? https://www.kaggle.com/c/shelter-animal-outcomes/data – user3789200 Jul 30 '16 at 11:19
  • Why not `full[, .(freq = .N, tf = Breed[.N > 66]), by = Breed]`? It will give you `NA` values for the rows that don't meet the condition, but it is certainly better than trying to alogical and a character in one variable. – Jaap Jul 30 '16 at 12:31

1 Answers1

0

You could use dplyr:

library(dplyr)
df%>%group_by(Breed)%>%summarize(Frequency=n())%>%mutate(TrueFalse=ifelse(Frequency<66,"F",as.character(Breed)))

which results in :

Source: local data frame [14 x 3]

                                    Breed Frequency               TrueFalse
                                   <fctr>     <int>                   <chr>
    1       American Pit Bull Terrier Mix         4                       F
    2                          Angora Mix         2                       F
    3                       Cairn Terrier         4                       F
    4   Cairn Terrier/Chihuahua Shorthair         4                       F
    5              Domestic Shorthair Mix       519  Domestic Shorthair Mix
    6             Flat Coat Retriever Mix         2                       F
    7                  Great Pyrenees Mix         4                       F
    8         Lhasa Apso/Miniature Poodle         4                       F
    9             Miniature Schnauzer Mix         4                       F
    10                       Pit Bull Mix        10                       F
    11                  Plott Hound/Boxer        73       Plott Hound/Boxer
    12              Queensland Heeler Mix         2                       F
    13              Yorkshire Terrier Mix         4                       F
    14              Shetland Sheepdog Mix        75   Shetland Sheepdog Mix

where df is:

    df<-structure(list(Breed = structure(c(14L, 5L, 10L, 5L, 8L, 4L, 
5L, 5L, 1L, 3L, 5L, 9L, 10L, 13L, 7L, 5L, 5L, 10L, 2L, 6L, 12L, 
5L, 11L, 14L, 5L, 10L, 5L, 8L, 4L, 5L, 5L, 1L, 3L, 5L, 9L, 10L, 
13L, 7L, 5L, 5L, 10L, 2L, 6L, 12L, 5L, 11L, 14L, 5L, 10L, 5L, 
8L, 4L, 5L, 5L, 1L, 3L, 5L, 9L, 10L, 13L, 7L, 14L, 5L, 10L, 5L, 
8L, 4L, 5L, 5L, 1L, 3L, 5L, 9L, 10L, 13L, 7L, 5L, 11L, 14L, 5L, 
5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 
14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 
5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 
14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 
5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 
14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 
5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 
14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 
5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 
14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 
5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 
14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 
5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 
14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 
5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 
14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 
5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 
14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 
5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 
14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 5L, 11L, 14L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c(" American Pit Bull Terrier Mix", 
" Angora Mix", " Cairn Terrier", " Cairn Terrier/Chihuahua Shorthair", 
" Domestic Shorthair Mix", " Flat Coat Retriever Mix", " Great Pyrenees Mix", 
" Lhasa Apso/Miniature Poodle", " Miniature Schnauzer Mix", " Pit Bull Mix", 
" Plott Hound/Boxer", " Queensland Heeler Mix", " Yorkshire Terrier Mix", 
"Shetland Sheepdog Mix"), class = "factor")), .Names = "Breed", class = "data.frame", row.names = c(NA, 
-711L))
thisisrg
  • 596
  • 3
  • 12