How to get data set using python or C++

Question

I have a data look like:

So if I give the name for each row: a1 a2 b1 b2 c1 c2 d1 d2. The rule is: A B C D, you can swap position in each big row. I need to take set of 4 numbers, so I will have:

a1 b1 c1 d1
a2 b1 c1 d1
a1 b2 c2 d2
a2 b2 c2 d2
a1 b1 c2 d2
a2 b1 c2 d2
a1 b1 c1 d2
a2 b1 c1 d2
a1 b2 c1 d2
a2 b2 c1 d2
a1 b1 c1 d2
a2 b1 c1 d2
a1 b1 c2 d1
a2 b1 c2 d1
a1 b2 c2 d1
a2 b2 c2 d1

So when I changed the number I will have many set of data. How can I filter to take unique of data set. And count how many times It appear for each unique set.

score 1 · Accepted Answer · answered Jul 13 '20 at 11:33

Mm, genetics, it's tasty...

So, for solve this problem in Python you should do:

Grab the data from xmls (It seems like xmls). This, just use pandas: pd.read_excel()
(OPTIONAL STEP) Prepare your data. I see one cell with no value, it might cause some problem.
Create indexes according you wishes (a1,a2 etc.). You can generate it with for-loor with list as return and then use pd.set_index()
Main idea: you create 2 for loops: one for,let's say, statical component (outer loop), another for dynamic comonent (inner loop).
In your example:
- a1 b1 c1 d1
- a1 b1 c1 d1
- a2 b1 c1 d1

Statical is "b1 c1 d1", and dynamic is "a1" --> "a2".

After one iteration the statical component must change "b1 c1 d1" --> "b2 c2 d2". All iterations must finish with adding the set into the list (list.append(set)) you created.

After operations above, you need to filter this. Steps are:

Create an empty dict where a key is representation of unique element and value is number of time it appears
Make the for loop like:
for set in list_of_sets: if set not in dict: dict[set] = 1 else: dict[set] += 1

Or you can use collection.Counter or np.unique()(EXAMPLE).
I hope it will help you with your task.

score 1 · Answer 2 · answered Jul 15 '20 at 03:35

Thank you Roman_N, here is my code:

import pandas as pd
import xlrd
import functools, operator
import itertools
from collections import Counter

df = pd.read_csv("BN.csv")

result = []
for index,row in df.iterrows():
    s = [[row['a1'],row['a2']], [row['b1'],row['b2']], [row['c1'],row['c2']], [row['d1'],row['d2']]]
    for item in list(itertools.product(*s)):
        result.append(item)

# print(result)

counts = Counter(item for item in result)

for element in counts:
    print(element, counts[element])

print(list, 'length is', len(counts))

How to get data set using python or C++

2 Answers2