4

What's the canonical way of finding a row in a DataFrame in DataFrames.jl?

For instance, given this DataFrame:

│ Row  │ uuid                                 │ name                          
│      │ String                               │ String                       
├──────┼──────────────────────────────────────┼──────────────────────────────
│ 1    │ 0efae8bf-39e6-5d65-b05d-c8947f4cee2a │ COSMA_jll                    
│ 2    │ 17ccb2e5-db19-44b3-b354-4fd16d92c74e │ CitableImage   

Given the name "CitableImage", what's the best way to retrive the uuid?

GLee
  • 5,003
  • 5
  • 35
  • 39

1 Answers1

6

I would typically use:

filter(:name => ==("CitableImage"), df)

which produces a data frame as you can have more than one matching row.

If you are sure that only one row will match then you can also write:

df[only(findall(==("CitableImage"), df.name)), :]

(the only function checks that you picked only one row)

If you want to get a data frame using indexing you can write:

df[df.name .== "CitableImage", :]

or

df[findall(==("CitableImage"), df.name), :]

Finally we also provide the subset function, but its normal use case is a bit different so here is is more verbose than filter:

subset(df, :name => ByRow(==("CitableImage")))

If you want to do many lookups and want them to be efficient then it is better to do the following:

gdf = groupby(df, :name)

and then do:

gdf[("CitableImage",)]

which will be much faster if you do many such lookups.

Bogumił Kamiński
  • 66,844
  • 3
  • 80
  • 107