1

I have 59 columns whose name is in the format: nn: xxxxxx (ttttttt), where tttttt is some name which is repeated for some particular columns. Now I want to calculate the sum of non-empty cells when tttttt='XXXXXX'. I know how to calculate the number of non-empty cells in a column, but how do I add the condition of ttttt being XXXXXX in the name of a column?

import pandas as pd
df = pd.read_csv("dane.csv", sep=';')
shape = list(df.shape)
nonempty=df.apply(lambda x: shape[0]-x.isnull().sum())

Input:

1: Brandenburg (Post-Panamax)               2: Acheron (Feeder)                        5: Fenton (Feeder)
ES-NL-10633096/1938/X1@hkzydbezon.dk/6749   DE-JP-20438082/2066/A2@qwinfhcaer.cu/68849 NL-LK-02275406/2136/A1@ozmmfdpfts.de/73198
BE-BR-61613986/3551/B1@oqk.bf/39927         NL-LK-02275406/2136/A1@ozmmfdpfts.de/73198
PH-SA-39552610/2436/A1@venagi.hr/80578
PA-AE-59814691/4881/X1@zhicvzvksl.cl/25247  OM-PH-31303222/3671/Z1@jtqy.ml/52408

So for instance for this input, lets say I want to calculate the number of non empty cells for the name in the column 'Feeder'

Umar.H
  • 22,559
  • 7
  • 39
  • 74
ryszard eggink
  • 315
  • 3
  • 14
  • Could you share some of the data from the CSV, so that we can see the headers and execute your code? – AMC Oct 29 '19 at 19:06
  • are you nulls properly defined as `NaN` or just a string with spaces? you might need to convert them first – Umar.H Oct 29 '19 at 19:07
  • @AlexanderCécile the headers are as in the input example I included. and the nulls are defined fine – ryszard eggink Oct 29 '19 at 19:20

1 Answers1

2

You can use filter:

df.filter(like='(Feeder)').isna().sum()

or a more precise version, which requires (Feeder) to appear at the end of the column:

df.filter(regex='.*(\(Feeder\))$').isna().sum()

Output:

2: Acheron (Feeder)    1
5: Fenton (Feeder)     3
dtype: int64
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74