Is there a function in pandas to help me count each string from each row list?

Question

I have a dataframe like this:

df1
   a  b  c
0  1  2  [bg10, ng45, fg56]
1  4  5  [cv10, fg56]
2  7  8  [bg10, ng45, fg56]
3  7  8  [fg56, fg56]
4  7  8  [bg10]

I would like to count the total occurences take place of each type in column 'c'. I would then like to return the value of column 'b' for the values in column 'c' that have a count total of '1'.

The expected output is soemthing like this:

           c    b   total_count
    0   bg10    2   2
    0   ng45    2   2
    0   fg56    2   5
    1   cv10    5   1
    1   fg56    5   5

I have tried the 'Collections' library, and a 'for' loop (I understand its not best practise in Pandas) but i think i'm missing some fundamental udnerstanding of lists within cells, and how to perform analysis like these.

Thank you for taking my question into consideration.

Hi, you can use `apply` function on the `c` column and combine it with `Counter` object. See this link for more help: [how can i count the occurrences of a list item](https://stackoverflow.com/questions/2600191/how-can-i-count-the-occurrences-of-a-list-item) — DavidDr90, Aug 06 '20 at 13:08
Hello, I have tried this but it doesnt return any data. Could you p[lease sahare a written example of applying this type of function? Thank you. — Ventoii, Aug 06 '20 at 13:32
@DavidDr90 I added the point that i'd liek to return the total occurences adding up each row also. — Ventoii, Aug 06 '20 at 13:36
Please specify your expected sample output so its clear for us to provide a proper solution. Is the `total` occurences to be found based on individual row lists or the entire column should be taken to consideration? — Anant Kumar, Aug 06 '20 at 14:21

Rachel Shalom · Accepted Answer · 2020-08-06T15:37:16.780

0

I would use apply the following way:

first I create the df:

df1=pd.DataFrame({"b":[2,5,8,8], "c":[['bg10', 'ng45', 'fg56'],['cv10', 'fg56'],['bg10', 'ng45', 'fg56'],['fg56', 'fg56']]})

next use apply to count the number of (non unique) items in a list and save it in a different column:

df1["count_c"]=df1.c.apply(lambda x: len(x))

you will get the following:

     b             c        count_c
  0  2  [bg10, ng45, fg56]        3
  1  5        [cv10, fg56]        2
  2  8  [bg10, ng45, fg56]        3
  3  8        [fg56, fg56]        2

to get the lines when c larger than threshold:`

df1[df1["count_c"]>2]["b"]

note: if you want to count only unique values in each list in column c you should use:

df1["count_c"]=df1.c.apply(lambda x: len(set(x)))

EDIT

in order to count the total number of each item I would try this: first let's "unpack all the lists into columns

new_df1=(df1.c.apply(lambda x: pd.Series(x))).stack().reset_index(level=1,drop=True).to_frame("c").join(df1[["b"]],how="left")

then get the total counts of each item in the list and add it to a new col:

  counts_dict=new_df1.c.value_counts().to_dict()
  new_df1["total_count_c"]=new_df1.c.map(counts_dict)
  new_df1.head()
  
           c    b   total_count_c
    0   bg10    2   2
    0   ng45    2   2
    0   fg56    2   5
    1   cv10    5   1
    1   fg56    5   5

edited Aug 06 '20 at 15:37

answered Aug 06 '20 at 13:40

Rachel Shalom

399
2
10

Hi, Thank you for your time, and really clear detailed answer! Perhaps I wasn't clear enough in my question. I' like to return the total occurences of each type. – Ventoii Aug 06 '20 at 13:46
ahh you mean of each type in column c in total? – Rachel Shalom Aug 06 '20 at 13:48
@Ventoii I edited this to total counts. let me know if this helped – Rachel Shalom Aug 06 '20 at 14:08
Thanks again. The only issue is that the 'map' function is throweing an error on the dataframe: 'AttributeError: 'DataFrame' object has no attribute 'map'' – Ventoii Aug 06 '20 at 15:34
you can try: new_df1["total_count_c"]=new_df1.c.map(counts_dict) I fixed it in the answer. it works for me:) – Rachel Shalom Aug 06 '20 at 15:36
@Ventoii btw if the answered helped you, it will be great if you accept it:) – Rachel Shalom Aug 06 '20 at 17:39
Sorry for the late reply. I finished work, and then it was a long weekedn here in France. I will test you proposal this morning :-) Thank you! – Ventoii Aug 10 '20 at 07:31

Is there a function in pandas to help me count each string from each row list?

1 Answers1