2
using CSV, DataFrames
iris = CSV.read(joinpath(dirname(pathof(DataFrames)),"..","test/data/iris.csv"))

head(iris)
6×5 DataFrame
│ Row │ SepalLength │ SepalWidth │ PetalLength │ PetalWidth │ Species │
│     │ Float64⍰    │ Float64⍰   │ Float64⍰    │ Float64⍰   │ String⍰ │
├─────┼─────────────┼────────────┼─────────────┼────────────┼─────────┤
│ 1   │ 5.1         │ 3.5        │ 1.4         │ 0.2        │ setosa  │
│ 2   │ 4.9         │ 3.0        │ 1.4         │ 0.2        │ setosa  │
│ 3   │ 4.7         │ 3.2        │ 1.3         │ 0.2        │ setosa  │
│ 4   │ 4.6         │ 3.1        │ 1.5         │ 0.2        │ setosa  │
│ 5   │ 5.0         │ 3.6        │ 1.4         │ 0.2        │ setosa  │
│ 6   │ 5.4         │ 3.9        │ 1.7         │ 0.4        │ setosa  │

I want to find all rows where Species is in setosa or virginica. Note that the answer must use a lookup into an array of values to find since I want the result to work when looking for arbitrarily many values.


There is a function called indexin. It gets me halfway there:

iris[indexin(iris.Species ,["setosa", "virginica"])]

But when I try to use it for indexing the result is:

ERROR: ArgumentError: Only Integer values allowed when indexing by vector of numbers
The Unfun Cat
  • 29,987
  • 31
  • 114
  • 156

3 Answers3

3
iris[ in.(iris[:Species],(["virginica","setosa"],)),: ]

The additional tuple around ["virginica","setosa"] allows to avoid broadcasting over the search list.

Przemyslaw Szufel
  • 40,002
  • 3
  • 32
  • 62
  • Ah, that is kinda ugly, but I guess those kinks will be ironed out eventually :) Thanks – The Unfun Cat Oct 02 '18 at 12:09
  • 2
    A more Julian approach would be to write `filter(x -> x[:Species] in ["virginica", "setosa"], iris)`. If you use DataFramesMeta this is what you can alternatively use `@where(iris, in.(:Species, [["virginica", "setosa"]]))`. – Bogumił Kamiński Oct 02 '18 at 12:27
1

A way to achieve this is to use findall:

iris[findall(in(["setosa", "virginica"]), iris.Species), :]
0

You can use the findin function.

iris[findin(iris[:Species],["setosa","virginica"]),:]

Note that if you want to use findin to search only one value, it has to be always an array, like

iris[findin(iris[:Species],["setosa"]),:]
tpdsantos
  • 423
  • 5
  • 6