-1

I would like to extract all values from dataframe that are non zero and write these data into separate column (with header name and value).

I was able to find a solution via Excel:

=TRANSPOSE(INDEX(IF(MOD(SEQUENCE(2),2),FILTER(C$1:H$1,C2:H2<>0),FILTER(C2:H2,C2:H2<>0)),MOD(SEQUENCE(COUNTIF(C2:H2,"<>0")*2)-1,2)+1,ROUNDUP(SEQUENCE(COUNTIF(C2:H2,"<>0")*2)/2,0)))

But I would like to do it using Python.

Input data:

enter image description here

Desired output:

enter image description here

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
adamekcerv
  • 11
  • 1
  • 4

2 Answers2

0

Simple solution

Just walkthrough columns:

df = pd.DataFrame({
    "a": [0, 0, 2, 0, 3],
    "b": [0, 1, 1, 0, 1]
})
result = [df[col][df[col] != 0] for col in df.columns]
print(*result)

Output (non-zero series for each column):

2    2
4    3
Name: a, dtype: int64 1    1
2    1
4    1
Name: b, dtype: int64

It's a series, where value is value (2, 3), index is index of non-zero elements (2, 4) and name of series is name.

0

You can use defaultdict to keep occurrences and values:

try:

d = defaultdict(list)
for row, val in df.iterrows():
    for col in range(len(df.columns)):
        if val[col] != 0:
            d[row+1].append((col+1, val[col]))

for i in range(max(len(x) for x in d.values())):
    df['occurrence ' + str(i+1)] = ['name_' + str(list(d.values())[j][i][0]) if i<len(d[j+1]) else None for j in range(len(d))]
    df['value ' + str(i+1)] = [str(list(d.values())[j][i][1]) if i<len(d[j+1]) else None for j in range(len(d))]

to replicate everything:

import pandas as pd
from collections import defaultdict

data = {'name_1': {0: 0, 1: 0, 2: 100, 3: 10, 4: 0, 5: 50},
 'name_2': {0: 100, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0},
 'name_3': {0: 0, 1: 50, 2: 0, 3: 50, 4: 0, 5: 0},
 'name_4': {0: 0, 1: 50, 2: 0, 3: 0, 4: 0, 5: 0},
 'name_5': {0: 0, 1: 0, 2: 0, 3: 40, 4: 100, 5: 0},
 'name_6': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 50}}

df = pd.DataFrame(data)

d = defaultdict(list)
for row, val in df.iterrows():
    for col in range(len(df.columns)):
        if val[col] != 0:
            d[row+1].append((col+1, val[col]))
            
for i in range(max(len(x) for x in d.values())):
    df['occurrence ' + str(i+1)] = ['name_' + str(list(d.values())[j][i][0]) if i<len(d[j+1]) else None for j in range(len(d))]
    df['value ' + str(i+1)] = [str(list(d.values())[j][i][1]) if i<len(d[j+1]) else None for j in range(len(d))]

updated replication based on OP's comments

import pandas as pd
from collections import defaultdict

data = {'doprava': {0: 0, 1: 0, 2: 100, 3: 10, 4: 0, 5: 50},
 'krajinna_produkce': {0: 100, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0},
 'krajinne': {0: 0, 1: 50, 2: 0, 3: 50, 4: 0, 5: 0},
 'obytne': {0: 0, 1: 50, 2: 0, 3: 0, 4: 0, 5: 0},
 'produkcni': {0: 0, 1: 0, 2: 0, 3: 40, 4: 100, 5: 0},
 'rekreacni': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 50}}

df = pd.DataFrame(data)

d = defaultdict(list)
for row, val in df[['doprava', 'krajinna_produkce', 'krajinne', 'obytne', \
                    'produkcni', 'rekreacni']].iterrows():
    for col in range(len(df.columns)):
        if val[col] != 0:
            d[row+1].append((val.index[col], val[col]))
            
for i in range(max(len(x) for x in d.values())):
    df['occurrence ' + str(i+1)] = [str(list(d.values())[j][i][0]) \
                               if i<len(d[j+1]) else None for j in range(len(d))]
    df['value ' + str(i+1)] = [str(list(d.values())[j][i][1]) \
                               if i<len(d[j+1]) else None for j in range(len(d))]

output:

doprava krajinna_produkce krajinne obytne produkcni rekreacni occurrence 1 value 1 occurrence 2 value 2 occurrence 3 value 3
0 100 0 0 0 0 krajinna_produkce 100
0 0 50 50 0 0 krajinne 50 obytne 50
100 0 0 0 0 0 doprava 100
10 0 50 0 40 0 doprava 10 krajinne 50 produkcni 40
0 0 0 0 100 0 produkcni 100
50 0 0 0 0 50 doprava 50 rekreacni 50
mmustafaicer
  • 434
  • 6
  • 15
  • Thank you, my data looks a bit different. since I have string ID in the first collumn. – adamekcerv Aug 27 '22 at 20:52
  • Then do a pandas slicing. If this solves your problem please mark as answer, if does not let me know problems. `df[['name_1', 'name_2', ....]]'.iterrows()`. – mmustafaicer Aug 28 '22 at 03:46
  • I can show you my result: [link](https://imgur.com/a/yLj2dQf). @mmustafaicer – adamekcerv Aug 28 '22 at 17:50
  • What is `mappa_id`, why are you including index in `.iterrows()`? The original data that you shared above was just names `name_1`, `name_2` etc. – mmustafaicer Aug 29 '22 at 14:06
  • @adamekcerv please check this snippet [code](https://replit.com/@Mehmet-MustafaM/DismalCrushingProtocols#main.py) – mmustafaicer Aug 29 '22 at 14:41
  • TypeError: first argument must be callable or None. @mmustafaicer – adamekcerv Aug 29 '22 at 20:45
  • Which function is that? Screenshot? – mmustafaicer Aug 30 '22 at 04:29
  • It happens when I use your updated code above. - Even when I just copy it and not changing anything. – adamekcerv Aug 31 '22 at 17:21
  • I don't think I can help more. I tested again and it definitely works, here is the [proof](https://serving.photos.photobox.com/71820010fb3eccb722672cef5021f5a18d3c8ea7e2120b5d7eb16cb68e3ff4ebdb309bea.jpg) – mmustafaicer Aug 31 '22 at 17:46
  • That is strange. Probably something with Python version? (Python 3.9.12), or idk. [proof](https://imgur.com/a/fhWwvjc) – adamekcerv Aug 31 '22 at 18:15
  • probably your answer is [here](https://stackoverflow.com/questions/42137849/defaultdict-first-argument-must-be-callable-or-none) – mmustafaicer Sep 01 '22 at 14:26
  • I was able to solve it actually. I have one more issue with the code, since I have ID collumn with values in my data. - I can drop it after. But is there any other way how to not count with "mappa_id"? – adamekcerv Sep 01 '22 at 14:32
  • Again maybe `.loc` indexer would work for you when you are doing first for loop. `df.loc[:, df.columns != 'mappa_id'].iterrows():`. If you have more than 10-20 columns that you don't want to list all of them, you can use this one. `df.drop` is alternative because it does not take `inplace` as a default. So you would keep your `mappa_id` column because it is not `inplace=True`. It is all up to you how you choose it. – mmustafaicer Sep 01 '22 at 14:36
  • Thanks, I get it, how to do it with df.drop, but not sure how df.loc works. Sorry, I've just started with python about a week ago. – adamekcerv Sep 01 '22 at 20:36
  • `df.loc[row, column]` is an accessor based on labels. So `df.loc[:, df.columns != 'mappa_id']` means that give me every row (`:` means every row) and all columns except *mappa_id*. [documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html). If this all solved your problem, please check that checkmark near my answer to close this thread. Thanks. – mmustafaicer Sep 02 '22 at 14:34