I have a python dataframe df of values from different systems:
System Value1 Value2 Value3...
S1 x x x...
S2 x x x...
S3 x x x...
And I want to know which Value1 entry occurs in all systems and write this into a list.
This is what I tired so far: First of all, I created a list of Value1 entries, which occur as often as the number of systems n (identvalue):
identvalue = []
from collections import defaultdict
dic = defaultdict(int)
Input = df['Value1']
for i in Input:
dic[i]+= 1
n = len(systemno) # number of systems in list
for element in Input:
if element in dic.keys() and dic[element] == n:
identvalue.append(element)
identvalue=list(set(identvalue)) # remove multiple entries
Next, I have to remove those entries from the identvalue list which are occuring n times, but not once per system. So, I tried several things:
idv = identvalue
i=0
while i < len(identvalue):
tmp1= df.loc[df['Value1'] == identvalue[i]]
no_ids = len(set(tmp1['System']))
if no_ids != n:
idv.remove(identvalue[i])
i += 1
But here, I get an IndexError: list index out of range.
Then I tried:
idv = identvalue
for element in identvalue:
tmp1= df.loc[df['Value1'] == element]
no_ids = len(set(tmp1['System']))
if no_ids != n:
idv.remove(element)
But here, it does not run though the full identvalue list but finishes (without error message) after half of the list. Same happens when using enumarate function. What am I doing wrong? And I guess there's a much easier way to achieve my goal either way!?