2

Am trying to generate 3.3 million fake rows using python as below snippet. generating the file is very very slow. any help speedup this?

Python version - 3.9.7

import os, csv, time, sys
from datetime import datetime
from faker import Faker
from time import sleep
from progress.bar import Bar


os.system('clear')
sCount = "distID.in"

fake = Faker()
startTime = datetime.now()

count = sum(1 for line in open(sCount))
fakeFile = open('fakeFile.csv', 'w')
bar = Bar('Processing', max=count)
with open(sCount) as piiFile:

    i=666000000
    for oldID in piiFile:
        i=i+1
        fn = fake.first_name()
        ln = fake.last_name()
        dob = (f'{fake.date_of_birth()}')
    
        fakeFile.write(f'{i},{fn},{ln},{dob},{oldID}'+'\n')
        bar.next()

fakeFile.close()
bar.finish()
OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Yum.Vee
  • 41
  • 5
  • Did you try making one list with all the rows/lines then write the lines once to the file? Or create one looong string, each line separated by `\n` then write the string once to a file? – wwii Jan 05 '22 at 22:49
  • ^ That's definitely one improvement since every `.write()` is a system call – Brad Solomon Jan 05 '22 at 22:55
  • Does [Time performance in Generating very large text file in Python](https://stackoverflow.com/questions/49266939/time-performance-in-generating-very-large-text-file-in-python) answer your question? – wwii Jan 05 '22 at 22:55
  • While it may actually slow you down a bit, you are better off with the `csv` module to properly handle things like escaping, rather than writing the comma-separated raw string – Brad Solomon Jan 05 '22 at 22:57
  • Does this answer your question? [Using Python Faker generate different data for 5000 rows](https://stackoverflow.com/questions/45574191/using-python-faker-generate-different-data-for-5000-rows) – ggorlen Sep 10 '22 at 03:55

0 Answers0