0

I have a data look like:

enter image description here

So if I give the name for each row: a1 a2 b1 b2 c1 c2 d1 d2. The rule is: A B C D, you can swap position in each big row. I need to take set of 4 numbers, so I will have:

  • a1 b1 c1 d1
  • a2 b1 c1 d1
  • a1 b2 c2 d2
  • a2 b2 c2 d2
  • a1 b1 c2 d2
  • a2 b1 c2 d2
  • a1 b1 c1 d2
  • a2 b1 c1 d2
  • a1 b2 c1 d2
  • a2 b2 c1 d2
  • a1 b1 c1 d2
  • a2 b1 c1 d2
  • a1 b1 c2 d1
  • a2 b1 c2 d1
  • a1 b2 c2 d1
  • a2 b2 c2 d1

So when I changed the number I will have many set of data. How can I filter to take unique of data set. And count how many times It appear for each unique set.

PDA
  • 71
  • 7

2 Answers2

1

Mm, genetics, it's tasty...

So, for solve this problem in Python you should do:

  1. Grab the data from xmls (It seems like xmls). This, just use pandas: pd.read_excel()
  2. (OPTIONAL STEP) Prepare your data. I see one cell with no value, it might cause some problem.
  3. Create indexes according you wishes (a1,a2 etc.). You can generate it with for-loor with list as return and then use pd.set_index()
  4. Main idea: you create 2 for loops: one for,let's say, statical component (outer loop), another for dynamic comonent (inner loop).
  5. In your example:
    • a1 b1 c1 d1
    • a1 b1 c1 d1
    • a2 b1 c1 d1

Statical is "b1 c1 d1", and dynamic is "a1" --> "a2".

After one iteration the statical component must change "b1 c1 d1" --> "b2 c2 d2". All iterations must finish with adding the set into the list (list.append(set)) you created.

After operations above, you need to filter this. Steps are:

  1. Create an empty dict where a key is representation of unique element and value is number of time it appears
  2. Make the for loop like:
    for set in list_of_sets: if set not in dict: dict[set] = 1 else: dict[set] += 1

Or you can use collection.Counter or np.unique()(EXAMPLE).
I hope it will help you with your task.

Roman_N
  • 183
  • 7
1

Thank you Roman_N, here is my code:

import pandas as pd
import xlrd
import functools, operator
import itertools
from collections import Counter

df = pd.read_csv("BN.csv")

result = []
for index,row in df.iterrows():
    s = [[row['a1'],row['a2']], [row['b1'],row['b2']], [row['c1'],row['c2']], [row['d1'],row['d2']]]
    for item in list(itertools.product(*s)):
        result.append(item)

# print(result)

counts = Counter(item for item in result)

for element in counts:
    print(element, counts[element])

print(list, 'length is', len(counts))
PDA
  • 71
  • 7