0

I am having large\huge number of Codes ( number of Codes = 9898654986. ) even sometime it will be (10, 20, 40.... even 200digits) in multiple files. I want to make all possible combination of that codes, combination should be printed on multiple files. but I want all possible combination, for example if codes = ab1, aa2, dc3, xx4. so combination should be be like.. ('ab1', 'ab1'), ('ab1', 'aa2'), ('ab1', 'dc3'), ('ab1', 'xx4'), ('aa2', 'ab1'), ('aa2', 'aa2'), ('aa2', 'dc3'), ('aa2', 'xx4'), ('dc3', 'ab1'), ('dc3', 'aa2'), ('dc3', 'dc3'), ('dc3', 'xx4'), ('xx4', 'ab1'), ('ab1', 'aa2'), ('ab1', 'dc3'), ('xx4', 'xx4'). for at least less than 1 millions codes with single file I doing this way.

import os, import pandas as pd, import itertools
os.chdir('C:/Users/Rashid/Desktop/Try')


df = pd.read_excel("code1.xlsx", header=None, index_col=False, dtype=str)
df.columns = ['A']

i = 0
lines = []
for item in itertools.product(df['A'], repeat=4):
    lines.append(item)

    if len(lines) > 50000:
        with open(f'4ta.{i}.txt', 'a') as f:
            f.write(''.join(str(lines)))
        lines = []
        i += 1
  • And what's your problem? – AKX Feb 19 '21 at 18:46
  • You may want to look at parallel processing if you are talking about 2 billion codes and you want to do permutations for all of these. My recommendation would be to revisit the problem statement itself and see if there are alternate ways to solve this. Doing a permutation of 2 billion codes may not be the best option – Joe Ferndz Feb 19 '21 at 19:01
  • For such a huge dataset take a look at this [answer](https://stackoverflow.com/a/50569606). Python may not be suitable for your task. – Yulian Feb 19 '21 at 19:37
  • You will need 1TB of storage from every computer on the internet to store those files and multiple years to write them. Perhaps you should approach this differently. – Alain T. Feb 19 '21 at 21:25

1 Answers1

0

If all your data is in df column "A", then:

for i, j in permutations(df["A"], r=2):
    print(i, j)

will generate all possible permutations.

Yulian
  • 365
  • 4
  • 12
  • Is this a recommendation for 2billion rows of data to come up with permutations? – Joe Ferndz Feb 19 '21 at 18:58
  • I mean, you're using a product(df['A'], repeat=4) in your provided code – Yulian Feb 19 '21 at 19:02
  • It's kindly requested to all of you, Please read question very Carefully Before answering. I already mentioned that "I can do with a single file, but my Question is "how to do with multiple files. like 1000 files". – mrashid yasin Feb 19 '21 at 20:22