2

Yesterday I was wondering how to access to dictionary keys in a DataFrame column (link). The solution was to use .str[<key>] on the pandas Series and call the tolist() method afterwards. However this does not work with object attributes.

My goal is to get as an output a list of a specific attributes for each object in a panda Series.

Here is a sample code with the solution I am working with. I cast the whole object Series as a list and then iterate over it to get the specific attribute. Is there a way to access directly the attribute ?

class User:
    def __init__(self, name):
        self.name = name


df = pd.DataFrame({
    'col1': [User("Juan"), User("Karen"), User("Vince")]
})


myObjects = df['col1'].tolist()
myNames = [u.name for u in myObjects]

# Desired output
['Juan', 'Karen', 'Vince']

And when I try the dictionary solution :

myNames = df["col1"].str['name'].tolist()

# Output
[nan, nan, nan]
Titouan L
  • 1,182
  • 1
  • 8
  • 24

3 Answers3

1

I would not recommend your method as it only works if you change the class. Alternatively, you can use apply() for this:

myNames = list(df['col1'].apply(lambda x: x.name))

List:

['Juan', 'Karen', 'Vince']

The str method works only dictionaries, but not on objects. If you make your object convertible to a dictionary it would work. For example like this:

class User:
    def __init__(self, name):
        self.name = name
        
    def __iter__(self):
        yield 'name', self.name


df = pd.DataFrame({
    'col1': [User("Juan"), User("Karen"), User("Vince")]
})

result = list(df['col1'].map(dict).str['name'])
JANO
  • 2,995
  • 2
  • 14
  • 29
  • Yeah, the class is from a library, so the `.str[]` is definitely not the option. Please note that the only reason your answer is not the accepted one is because @AndrejH's answer is a bit faster on my large DataFrame (around 20% faster). – Titouan L Mar 17 '22 at 09:05
1

You can use the attrgetter from operator library in combination with pandas.Series.map. This will map your inputs using the attrgetter, which returns a function that when called on the entries of col1 retrieve object attributes named name. Equivalent to lambda x: x.name

from operator import attrgetter
myNames = df["col1"].map(attrgetter("name")).tolist()

Output:

['Juan', 'Karen', 'Vince']
AndrejH
  • 2,028
  • 1
  • 11
  • 23
1

You can also try:

df['col1'].map(lambda x: x.name).to_list()
Muhammad Hassan
  • 4,079
  • 1
  • 13
  • 27