0

I have 50 CSV files, up to 2 millions records in each.

I daily need to get 10000 random records from each of the 50 files and make a new CSV files with all the info (10000*50)

I can not do it manually, because will take me a lot of time, also I've tried to use Access, but, because database is larger then 2G, I cannot use it. Also, I've tried to use CSVed - a good soft, but still did not help me.

Could someone please give an idea/soft in order to get random records from files and make a new CSV file?

ZygD
  • 22,092
  • 39
  • 79
  • 102
  • This is not related to Excel, as you would not get passed through the limit of ~1M rows. Also, this would be a perfect question for another site, dedicated to [software recommendations](http://softwarerecs.stackexchange.com/). – ZygD Mar 15 '15 at 14:47
  • Isn't this something that can be done pretty easily in some version of awk? – jia103 Mar 15 '15 at 19:49

1 Answers1

3

There are many languages you could use, I would use C# and do this.

1) Get the number of lines in a file.

Lines in text file

2) Generate the 10,000 random numbers (unique if you need that) based on the maximum being the count from step 1.

Random without duplicates

3) Pull the records from step 2 from the file and write to new file.

4) Repeat for each file.

Other options if you want to consider a database other than Access are MySQL or SQL Server Express to name a couple.

Community
  • 1
  • 1
Kevin
  • 2,566
  • 1
  • 11
  • 12