0

I have a dataframe

df = pd.DataFrame({'Binary_List': [[0, 0, 1, 0, 0, 0, 0],
                                   [0, 1, 0, 0, 0, 0, 0],
                                   [0, 0, 1, 1, 0, 0, 0],
                                   [0, 0, 0, 0, 1, 1, 1]]})
df

    Binary_List
0   [0, 0, 1, 0, 0, 0, 0]
1   [0, 1, 0, 0, 0, 0, 0]
2   [0, 0, 1, 1, 0, 0, 0]
3   [0, 0, 0, 0, 1, 1, 1]

I want to apply a function to each list, without use of apply because apply is very slow when running on large dataset

def count_one(lst):
    index = [i for i, e in enumerate(lst) if e != 0]
    # some more steps 
    return len(index)

df['Value'] = df['Binary_List'].apply(lambda x: count_one(x))
df

    Binary_List             Value
0   [0, 0, 1, 0, 0, 0, 0]   1
1   [0, 1, 0, 0, 0, 0, 0]   1
2   [0, 0, 1, 1, 0, 0, 0]   2
3   [0, 0, 0, 0, 1, 1, 1]   3

I tried using this, but no improvement

vfunc = np.vectorize(count_one)
df['Value'] = vfunc(df['Binary_List']) 

This gives me error

df['Value'] = count_one(df['Binary_List'])
Hardik Gupta
  • 4,700
  • 9
  • 41
  • 83
  • 1
    You cant because you're storing an `object` in a DataFrame. If you instead just stored each element as its own cell this is a trivial and extremely fast `df.sum(1)` – ALollz Sep 27 '19 at 15:36
  • 2
    Assuming it's for your previous question - https://stackoverflow.com/q/58136267/. Use the intermediate output from the posted answers, where you had the binary array output and sum along cols - `.sum(axis=1)`. – Divakar Sep 27 '19 at 15:37
  • @Divakar - I cannot use sum directly, as I have pointed out, there are more steps in that functions. Just for example I am calculating count of 1's – Hardik Gupta Sep 27 '19 at 15:50
  • 1
    Guess you would need stacking and summing : `np.vstack(df['Binary_Month_List']).sum(1)`. – Divakar Sep 27 '19 at 15:51
  • As long as you have a list of lists, the `apply` or other action has to run a interpreted Python speed. To get fast `numpy` speed it has to be a numeric 2d array. – hpaulj Sep 27 '19 at 15:56
  • @Divakar, this is not just about counting 1's, but applying a function without using `apply`. – Hardik Gupta Sep 27 '19 at 15:56
  • 1
    As I said in the previous Q&A, there's no magic function. – Divakar Sep 27 '19 at 15:57
  • agreed, but don't at least downvote the question – Hardik Gupta Sep 27 '19 at 15:58
  • That's not from me. – Divakar Sep 27 '19 at 16:00
  • 1
    As far as I know, downvotes (and up) are anonymous. I suspect most downvoters don't stay around to follow the comments. – hpaulj Sep 27 '19 at 16:10
  • 1
    `df['Binary_List'].map(sum)` ? – anky Sep 27 '19 at 16:17
  • 1
    What's the difference between dataframe `apply` and `map`? I doubt if either converts the function or `lambda` to compiled code. Calling the function once per row is the real time consumer, especially if the function is complex. The iteration mechanism itself is usually a minor part of the time cost. – hpaulj Sep 27 '19 at 17:44

2 Answers2

1

for getting length of list items you can use str function like below

df = pd.DataFrame({'Binary_List': [[0, 0, 1, 0, 0, 0, 0],
                                   [0, 1, 0, 0, 0, 0, 0],
                                   [0, 0, 1, 1, 0, 0, 0],
                                   [0, 0, 0, 0, 1, 1, 1]]})

df["Binary_List"].astype(np.str).str.count("1")
Dev Khadka
  • 5,142
  • 4
  • 19
  • 33
1

you can try DataFrame.explode:

df.explode('Binary_List').reset_index().groupby('index').sum()

        Binary_List
index   
0        1
1        1
2        2
3        3

Also you can do:

pd.Series([np.array(key).sum() for key in df['Binary_List']])
0    1
1    1
2    2
3    3
dtype: int64
ansev
  • 30,322
  • 5
  • 17
  • 31