0

I want to count the number of times a value in Child column appears in Parent column then display this count in new column renamed child count. See previews df below.

I have this done via VBA (COUNTIFS) but now need dynamic visualization and animated display with data fed from a dir. So I resorted to Python and Pandas and tried below code after searching and reading answers like: Countif in pandas with multiple conditions | Determine if value is in pandas column | Iterate over rows in Pandas df | many others... but still can't get the expected preview as illustrated in image below.

Any help will be very much appreciated. Thanks in advance.

#import libraries
import pandas as pd
import numpy as np
import os

#get datasets
path_dataset = r'D:\Auto'
df_ns = pd.read_csv(os.path.join(path_dataset, 'Scripts', 'data.csv'), index_col = False, encoding = 'ISO-8859-1', engine = 'python')

#preview dataframe
df_ns

#tried
df_ns.groupby(['Child','Parent', 'Site Name']).size().reset_index(name='child count')

#preview output
df_ns.groupby(['Child','Parent', 'Site Name']).size().reset_index(name='child count')

preview dataframe

enter image description here

preview output

enter image description here

expected output

enter image description here

[Edited] My data

Child = ['Tkt01', 'Tkt02', 'Tkt03', 'Tkt04', 'Tkt05', 'Tkt06', 'Tkt07', 'Tkt08', 'Tkt09', 'Tkt10']

Parent = [' ', ' ', 'Tkt03',' ',' ', 'Tkt03',' ', 'Tkt03',' ',' ', 'Tkt06',' ',' ',' ',]

Site_Name =[Yaounde','Douala','Bamenda','Bafoussam','Kumba','Garoua','Maroua','Ngaoundere','Buea','Ebolowa']

Community
  • 1
  • 1
King Sufu
  • 31
  • 8

2 Answers2

2

I created a lookalike of your df.

Before

enter image description here

Try this code

df['Count'] = [len(df[df['parent'].str.contains(value)]) for index, value in enumerate(df['child'])]
#breaking it down as a line by line code

counts = []
for index, value in enumerate(df['child']):
    found = df[df['parent'].str.contains(value)]
    counts.append(len(found))
df['Count'] = counts

After

enter image description here

Hope this works for you.

Isaac Frank
  • 351
  • 3
  • 12
  • 1
    Thank you @Isaac Frank ``` df_ns['child_count'] = [len(df_ns[df_ns['Parent'].str.contains(value)]) for index, value in enumerate(df_ns['Child'])] ``` works just fine. You spiced my weekend! :) – King Sufu Jun 19 '20 at 16:12
  • I'm glad I could help. Happy Coding :D – Isaac Frank Jun 19 '20 at 16:23
1

Since I don't have access to your data, I cannot check the code I am giving you. I suggest you will have problems with nan values with this line but you can give it a try.:

df_ns['child_count'] = df_ns['Parent'].groupby(df_ns['Child']).value_counts()

I give a name to the new column and directly assign values to it through the groupby -> value_counts functions.

Yana
  • 785
  • 8
  • 23