3

How I can count the total elements in a dataframe, including the subset, and put the result in the new column?

import pandas as pd
x = pd.Series([[1, (2,5,6)], [2, (3,4)], [3, 4], [(5,6), (7,8,9)]], \
              index=range(1, len(x)+1))
df = pd.DataFrame({'A': x})

I tried with the following code but it gives 2 in each of row:

df['Length'] = df['A'].apply(len)

print(df)

                         A  Length
    1       [1, (2, 5, 6)]       2
    2          [2, (3, 4)]       2
    3               [3, 4]       2
    4  [(5, 6), (7, 8, 9)]       2

However, what I want to get is as follow:

                         A  Length
    1       [1, (2, 5, 6)]       4
    2          [2, (3, 4)]       3
    3               [3, 4]       2
    4  [(5, 6), (7, 8, 9)]       5

thanks

Fadri
  • 157
  • 1
  • 1
  • 9

3 Answers3

1

Given:

import pandas as pd
x = pd.Series([[1, (2,5,6)], [2, (3,4)], [3, 4], [(5,6), (7,8,9)]])
df = pd.DataFrame({'A': x}) 

You can write a recursive generator that will yield 1 for each nested element that is not iterable. Something along these lines:

import collections 

def glen(LoS):
    def iselement(e):
        return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
    for el in LoS:
        if iselement(el):
            yield 1
        else:
            for sub in glen(el): yield sub    

df['Length'] = df['A'].apply(lambda e: sum(glen(e)))

Yielding:

>>> df
                     A  Length
0       [1, (2, 5, 6)]       4
1          [2, (3, 4)]       3
2               [3, 4]       2
3  [(5, 6), (7, 8, 9)]       5

That will work in Python 2 or 3. With Python 3.3 or later, you can use yield from to replace the loop:

def glen(LoS):
    def iselement(e):
        return not(isinstance(e, collections.Iterable) and not isinstance(e, str))
    for el in LoS:
        if iselement(el):
            yield 1
        else:
            yield from glen(el) 
dawg
  • 98,345
  • 23
  • 131
  • 206
  • Nice general answer. I like this modification of what you've done. https://pastebin.com/TANRfyKr. Also, my link's imports assume Python 3 – piRSquared Mar 15 '18 at 22:39
  • @piRSquared: Thanks. The original code was from an [earlier answer](https://stackoverflow.com/a/16176969/298607). There is also an improvement for Python 3.3+ to use `yield from` shown. – dawg Mar 16 '18 at 00:53
0

use itertools

df['Length'] = df['A'].apply(lambda x: len(list(itertools.chain(*x))))
Ray
  • 184
  • 10
0

You could try using this function, it's recursive but it works:

def recursive_len(item):
    try:
       iter(item)
       return sum(recursive_len(subitem) for subitem in item)
    except TypeError:
       return 1

Then just call the apply function this way:

df['Length'] = df['A'].apply(recursive_len)
frozencure
  • 181
  • 1
  • 9