Extract only non zero values from columns (with header name and value)

Question

I would like to extract all values from dataframe that are non zero and write these data into separate column (with header name and value).

I was able to find a solution via Excel:

=TRANSPOSE(INDEX(IF(MOD(SEQUENCE(2),2),FILTER(C$1:H$1,C2:H2<>0),FILTER(C2:H2,C2:H2<>0)),MOD(SEQUENCE(COUNTIF(C2:H2,"<>0")*2)-1,2)+1,ROUNDUP(SEQUENCE(COUNTIF(C2:H2,"<>0")*2)/2,0)))

But I would like to do it using Python.

Input data:

Desired output:

Please do not post data as images or even code snippets. Provide the data as text to your question to make it easy for others to reproduce the problem. — Rabinzel, Aug 26 '22 at 18:46
[How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391) — AlexK, Aug 26 '22 at 19:38
Are you sure, the excell solution works for any number of columns? — Александр М, Aug 26 '22 at 20:43

score 0 · Answer 1 · answered Aug 26 '22 at 20:43

Simple solution

Just walkthrough columns:

df = pd.DataFrame({
    "a": [0, 0, 2, 0, 3],
    "b": [0, 1, 1, 0, 1]
})
result = [df[col][df[col] != 0] for col in df.columns]
print(*result)

Output (non-zero series for each column):

2    2
4    3
Name: a, dtype: int64 1    1
2    1
4    1
Name: b, dtype: int64

It's a series, where value is value (2, 3), index is index of non-zero elements (2, 4) and name of series is name.

mmustafaicer · Answer 2 · 2022-08-29T15:02:13.093

0

You can use defaultdict to keep occurrences and values:

try:

d = defaultdict(list)
for row, val in df.iterrows():
    for col in range(len(df.columns)):
        if val[col] != 0:
            d[row+1].append((col+1, val[col]))

for i in range(max(len(x) for x in d.values())):
    df['occurrence ' + str(i+1)] = ['name_' + str(list(d.values())[j][i][0]) if i<len(d[j+1]) else None for j in range(len(d))]
    df['value ' + str(i+1)] = [str(list(d.values())[j][i][1]) if i<len(d[j+1]) else None for j in range(len(d))]

to replicate everything:

import pandas as pd
from collections import defaultdict

data = {'name_1': {0: 0, 1: 0, 2: 100, 3: 10, 4: 0, 5: 50},
 'name_2': {0: 100, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0},
 'name_3': {0: 0, 1: 50, 2: 0, 3: 50, 4: 0, 5: 0},
 'name_4': {0: 0, 1: 50, 2: 0, 3: 0, 4: 0, 5: 0},
 'name_5': {0: 0, 1: 0, 2: 0, 3: 40, 4: 100, 5: 0},
 'name_6': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 50}}

df = pd.DataFrame(data)

d = defaultdict(list)
for row, val in df.iterrows():
    for col in range(len(df.columns)):
        if val[col] != 0:
            d[row+1].append((col+1, val[col]))
            
for i in range(max(len(x) for x in d.values())):
    df['occurrence ' + str(i+1)] = ['name_' + str(list(d.values())[j][i][0]) if i<len(d[j+1]) else None for j in range(len(d))]
    df['value ' + str(i+1)] = [str(list(d.values())[j][i][1]) if i<len(d[j+1]) else None for j in range(len(d))]

updated replication based on OP's comments

import pandas as pd
from collections import defaultdict

data = {'doprava': {0: 0, 1: 0, 2: 100, 3: 10, 4: 0, 5: 50},
 'krajinna_produkce': {0: 100, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0},
 'krajinne': {0: 0, 1: 50, 2: 0, 3: 50, 4: 0, 5: 0},
 'obytne': {0: 0, 1: 50, 2: 0, 3: 0, 4: 0, 5: 0},
 'produkcni': {0: 0, 1: 0, 2: 0, 3: 40, 4: 100, 5: 0},
 'rekreacni': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 50}}

df = pd.DataFrame(data)

d = defaultdict(list)
for row, val in df[['doprava', 'krajinna_produkce', 'krajinne', 'obytne', \
                    'produkcni', 'rekreacni']].iterrows():
    for col in range(len(df.columns)):
        if val[col] != 0:
            d[row+1].append((val.index[col], val[col]))
            
for i in range(max(len(x) for x in d.values())):
    df['occurrence ' + str(i+1)] = [str(list(d.values())[j][i][0]) \
                               if i<len(d[j+1]) else None for j in range(len(d))]
    df['value ' + str(i+1)] = [str(list(d.values())[j][i][1]) \
                               if i<len(d[j+1]) else None for j in range(len(d))]

output:

doprava	krajinna_produkce	krajinne	obytne	produkcni	rekreacni	occurrence 1	value 1	occurrence 2	value 2	occurrence 3	value 3
0	100	0	0	0	0	krajinna_produkce	100
0	0	50	50	0	0	krajinne	50	obytne	50
100	0	0	0	0	0	doprava	100
10	0	50	0	40	0	doprava	10	krajinne	50	produkcni	40
0	0	0	0	100	0	produkcni	100
50	0	0	0	0	50	doprava	50	rekreacni	50

edited Aug 29 '22 at 15:02

answered Aug 26 '22 at 21:34

mmustafaicer

434
6
15

Thank you, my data looks a bit different. since I have string ID in the first collumn. – adamekcerv Aug 27 '22 at 20:52
Then do a pandas slicing. If this solves your problem please mark as answer, if does not let me know problems. `df[['name_1', 'name_2', ....]]'.iterrows()`. – mmustafaicer Aug 28 '22 at 03:46
I can show you my result: [link](https://imgur.com/a/yLj2dQf). @mmustafaicer – adamekcerv Aug 28 '22 at 17:50
What is `mappa_id`, why are you including index in `.iterrows()`? The original data that you shared above was just names `name_1`, `name_2` etc. – mmustafaicer Aug 29 '22 at 14:06
@adamekcerv please check this snippet [code](https://replit.com/@Mehmet-MustafaM/DismalCrushingProtocols#main.py) – mmustafaicer Aug 29 '22 at 14:41
TypeError: first argument must be callable or None. @mmustafaicer – adamekcerv Aug 29 '22 at 20:45
Which function is that? Screenshot? – mmustafaicer Aug 30 '22 at 04:29
It happens when I use your updated code above. - Even when I just copy it and not changing anything. – adamekcerv Aug 31 '22 at 17:21
I don't think I can help more. I tested again and it definitely works, here is the [proof](https://serving.photos.photobox.com/71820010fb3eccb722672cef5021f5a18d3c8ea7e2120b5d7eb16cb68e3ff4ebdb309bea.jpg) – mmustafaicer Aug 31 '22 at 17:46
That is strange. Probably something with Python version? (Python 3.9.12), or idk. [proof](https://imgur.com/a/fhWwvjc) – adamekcerv Aug 31 '22 at 18:15
probably your answer is [here](https://stackoverflow.com/questions/42137849/defaultdict-first-argument-must-be-callable-or-none) – mmustafaicer Sep 01 '22 at 14:26
I was able to solve it actually. I have one more issue with the code, since I have ID collumn with values in my data. - I can drop it after. But is there any other way how to not count with "mappa_id"? – adamekcerv Sep 01 '22 at 14:32
Again maybe `.loc` indexer would work for you when you are doing first for loop. `df.loc[:, df.columns != 'mappa_id'].iterrows():`. If you have more than 10-20 columns that you don't want to list all of them, you can use this one. `df.drop` is alternative because it does not take `inplace` as a default. So you would keep your `mappa_id` column because it is not `inplace=True`. It is all up to you how you choose it. – mmustafaicer Sep 01 '22 at 14:36
Thanks, I get it, how to do it with df.drop, but not sure how df.loc works. Sorry, I've just started with python about a week ago. – adamekcerv Sep 01 '22 at 20:36
`df.loc[row, column]` is an accessor based on labels. So `df.loc[:, df.columns != 'mappa_id']` means that give me every row (`:` means every row) and all columns except *mappa_id*. [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html). If this all solved your problem, please check that checkmark near my answer to close this thread. Thanks. – mmustafaicer Sep 02 '22 at 14:34

Extract only non zero values from columns (with header name and value)

2 Answers2