0

I'm pretty proficient with VBA, but I know almost nothing about Access! I'm running a complex simulation using Arrrays in VBA, and I want to store the results somewhere. Since the results of the simulation will be quite large (~1GB in memory), I'd like to store this in Access rather than Excel.

I currently have a large number of Arrays populated with my data, but I'm not sure how to write these to a database, or even how to create one with VBA. Here's what I need to do, in a nutshell, with VBA:

  • Create a new Access Database
  • Create a new Access Table (the db will be only a single table)
  • Create ~1200 fields programmatically
  • Copy the results from my arrays to the new Access table.

I've looked at a number of answers on here, but none of them seem to answer my question fully. For instance, Adding field to MS Access Table using VBA talks about adding fields to a database. But I don't see doubles listed here. Most of my arrays are doubles. Will this be a problem?

EDIT:

Here are a few more details about the project:

I am running a network design simulation. Thus, I start by generating ~150,000 unique networks. Then, I run a lot of calculations (no, these can't be simplified to queries unfortunately!) of characteristics for the network. There end up being ~1200 for each possible network (unique record). Thus, I would like to store these in an Access database. Each record will be a unique network, and each field will be a specific characteristic associated with that network.

Virtually all of the fields (arrays at this point!) are doubles.

Community
  • 1
  • 1
Austin Wismer
  • 281
  • 1
  • 4
  • 16
  • Why are you looking to create the database via VBA? Seems to me it would me _much_ easier to create a properly designed data structure in Access, then simply push your results into an existing table set. – FreeMan Apr 28 '15 at 12:36
  • Forgive me for sounding dumb, how might I do that? I will need to generate all of the field names programmatically, so I assumed VBA would be the best way to do this. Additionally, all of my arrays are already computed using VBA. – Austin Wismer Apr 28 '15 at 12:37
  • See [the MSDN ADO reference](https://msdn.microsoft.com/en-us/library/windows/desktop/ms677497%28v=vs.85%29.aspx). There are also some examples here of creating tables, for example [Create a SQL Table from Excel VBA](http://stackoverflow.com/questions/927349/create-a-sql-table-from-excel-vba). – Comintern Apr 28 '15 at 12:40
  • That part wasn't completely clear from the description of your issue. Maybe provide more details (understanding some things may be proprietary) to help us understand the situation better. To be honest, at this point this sounds like it might be more of a design issue and might get better results being moved to [Programmers](http://programmers.stackexchange.com/). – FreeMan Apr 28 '15 at 12:41
  • What does each array contain? Columns,rows or data that needs to be sorted? – Sam Apr 28 '15 at 12:41
  • An Access table can include no more than 255 fields. – HansUp Apr 28 '15 at 12:56
  • I will add more details in an edit to the question. Thanks for your help guys! – Austin Wismer Apr 28 '15 at 12:57
  • @HansUp, thanks for the information. That's unfortunate! What would be the performance penalty of trying to manipulate this much data in Excel? I'm looking for the information to be in a portable format -- ie, can't be run on a SQL server. – Austin Wismer Apr 28 '15 at 13:04
  • Sorry, I don't know Excel well enough to answer that question. – HansUp Apr 28 '15 at 13:08
  • You can treat an Excel worksheet as a database table, writing SQL `SELECT`, `INSERT`, `UPDATE` and `DELETE` queries against it. I'm not sure about the speed penalties of using Excel vs Access, but it's probably pretty similar, since they both use the Jet (or its replacement - the name escapes me at the moment) DB engine. – FreeMan Apr 28 '15 at 13:08
  • Treat each worksheet as a separate table, and you can make your own little database. Not sure, however if it has the same 255 column limitation. Might be worth a quick sample program to find out. – FreeMan Apr 28 '15 at 13:09
  • I'm not familiar with database design, but I think I really only need a single table, as all of my data refers to the same unique records. I suppose I could group them together based on characteristics. What would the the advantage? – Austin Wismer Apr 28 '15 at 13:11
  • I guess I just assumed excel would be much slower as all cells contain information about formatting, sizing etc. whereas all I need is a place to store data. – Austin Wismer Apr 28 '15 at 13:12

1 Answers1

1

You (almost?) never want a database with one table. You might as well store it in a text file. One of the main benefits of databases is relating data in different tables, and with one table you don't need it.

Fortunately for you, you need more than one table and a database might be the way to go. You (almost) never need to create permanent tables in code (temp tables, sure, but not permanent ones). If your field names are variable, you need to change your design. When data is variable, it goes in the data part of a database. When it's fixed, it can be a table or a field. Based on what you've said, I think you need this:

In Access create a tabled called tblNetworks with these fields

NetworkID    AutoNumber
NetworkName  Short Text

Then create another tabled called tblCalculations with these fields

CalcID       Autonumber
NetworkID    Long           (Relates to tblNetworks, one to many)
CalcDesc     Short Text
Result       Number (Double)

What you were going to name your fields in your Access table will be the CalcDesc data. You'll use ADODB to execute INSERT INTO sql statements that put the data in the tables.

You'll end with tblNetworks with 150k records and tblCalculations with 1,200 x 150k records or so. When you're tables grow longer and not wider as things change, that a good indication you designed it right.

If you're really unfamiliar with Access, I recommend learning how to create Tables, setting up Relationships, and Referential Integrity. If you don't know SQL, search for INSERT INTO. And if you haven't used ADO before in Excel, search for ADODB Connections and the Execute method.

Update

You can definitely get away with a CSV for this. Like you said, it's pretty low overhead. Whether a text file or a database is the right answer probably depends more on how you're going to use the data and how often.

If you're going to pull this into Excel a small number of times, do a few sorts or filters, maybe a pivot table, then any performance hit you get from a CSV isn't going to be that bad. And if you only need to deal with a subset of the data at a time, you can use ADO to read a text file and only pull in the data you want at that time, further mitigating the slowness of sorting and filtering 150k rows. Not to mention if you have a few gigs of RAM, 150k x 1,200 probably won't be bad at all.

If you find that the performance of a CSV stinks because your hardware isn't up to the task, you have to access it often, or you doing a ton of different queries against the data, it may be to your benefit to use the database. If you fields are structured as you say, you may benefit from even more tables. You'd still have the network table and the calc table, but you'd also have Market, Slot, and Characteristic tables. Then your Calc table would look like:

CalcID
CalcDesc
NetworkID
MarketID
SlotID
CharacteristicID
Result

If you looking for data a lot of times and you need it quickly, you're not going to do better than a bunch of INNER JOINs on those tables and a WHERE clause that limits what you want.

But only you can decide if it's worth all the setup and overhead of using a database. And because of that, I would start down the CSV path until the reason to change presented itself. I would design my code in a way that switching from CSV to database only touched a few procedure (like by using class modules) so that the change didn't affect any already-tested business logic.

Dick Kusleika
  • 32,673
  • 4
  • 52
  • 73
  • Yup, this is where I was heading, just didn't know enough about the data structures to give the guidance yet. – FreeMan Apr 28 '15 at 13:31
  • Thanks for the input. I think the best way to go may be using a CSV. That way, I will avoid any overhead associated with formatting in an Excel sheet. I'd have to discuss it further, but I'm still not sure separate tables are appropriate for this information. The field names themselves are variable. They are simply ... patterned, if that makes sense. For example, There are slots for up to 10 possible distribution centers in the model. Each distribution center will have 15 possible characteristics that needs to be computed across 13 different market segments. (continued) – Austin Wismer Apr 28 '15 at 13:37
  • From the little I understand about queries, I can't simply write a query for Slot 1_Market1_Characteristic 1. There is more complicated logic involved in the simulation. However, when the model has finished calculating, I will have a value (possibly (null)) for each of these "fields". Thus, it's programmatic, but not really variable, as they won't ever grow. More network configurations (record) could be added, but no more characteristics. As such, I end up with around 1200 unique calculations for each record. Please let me know if that makes sense, and if that changes your recommendation. – Austin Wismer Apr 28 '15 at 13:44
  • Also (sorry to comment multiple times), I meant above "The field names themselves ***AREN'T*** variable." – Austin Wismer Apr 28 '15 at 13:51