1

I've recently been using pandas for data analysis, and I'm trying to be properly pythonic about things. The following code works just fine to find all of the unique values in certain subset of columns:

import pandas as pd
dataframe = pd.read_csv("sourcefile.csv", na_values=[" ",""])
col_names = list(dataframe)
my_cols = [name for name in col_names if "STRING" in name]
unique_urls = set()
for col in my_cols:
    for url in list(dataframe[col]):
        unique_urls.add(url)

But I feel like there is a better way to do the last two nested for loops. Any advice appreciated!

EDIT: I may have found a better way based on some answers here: Find unique values in a Pandas dataframe, irrespective of row or column location

The following code works:

import pandas as pd
dataframe = pd.read_csv("sourcefile.csv", na_values=[" ",""])
col_names = list(dataframe)
my_cols = [name for name in col_names if "STRING" in name]
unique_urls = pd.unique(dataframe[my_cols].values.ravel())

I did a time test:

In [8]: def unique_items_1():
    unique_urls = set()
    for col in my_cols:
        for item in list(dataframe[col]):
            unique_items.add(item)          

In [9]: %timeit unique_items_1()
1000 loops, best of 3: 436 µs per loop

In [10]: %timeit unique_items_2 = pd.unique(dataframe[my_cols].values.ravel())
1000 loops, best of 3: 462 µs per loop

And since both take approximately the same amount of time, with the set() way being slightly faster, I'm still curious as to what the experts think is the best way. Thanks!

Community
  • 1
  • 1
tegan
  • 2,125
  • 2
  • 14
  • 17
  • 2
    This question and its answers might be relevant (possible duplicate): [pandas unique values multiple columns](http://stackoverflow.com/questions/26977076/pandas-unique-values-multiple-columns) – Alex Riley Dec 01 '14 at 21:49
  • Thanks ajcr - that thread seems to have the same two options, using `set()` and using `unique`. I guess my question is still not **how** to do this, but which is **"best practice"**, i.e. which looks most intuitive to people who use pandas for data analysis? Thanks! – tegan Dec 01 '14 at 22:07

0 Answers0