-1

I want to make a macro that will open all excel books in a folder, read the filled information and store them in a sheet which will represent my database. I need to know your suggestions and what is the best way to do that, to get a fast and flexible result.

To help you understand my question, let us suppose that I have 3 excel templates containing First Name, Last Name and Country, but at different positions like these pictures

Template 1 Template1

Template 2 Template2

Template 3 Template3

Based on that, the final result that I would like to get is : Result

The exemple that I am giving by these pictures is really very simple, but it was just to help you understand what I want. Now I will detail about the real need. In fact, I have 3 templates, but each of them contains about 80 fields of data to collect (not only first name, last name and country). And i don't have to read only 3 files, but I have to read about 200 files placed in a folder and each of them is either template1, or 2 or 3. In the future we may have a template 4 that's why I need something flexible.

I thought about named ranges, but the template 1,2,3 already exists, and I can't collect from the 200 users the 200 existing excels files, and before launching my macro, giving a named range to the 80 field at each file. I can use named range if in the future they will be a template 4, so before sending the files to the final user who will fill the excel we name the ranges and send it to him, but before the template 4,i have to fix the problem of the current 3 existing templates.

I also thought about reading data based on columns and row indexes,for exemple I check the type of file and if I am reading a file template one, I get first name from the cell (2,3), and If it's a template 2, i get the information from cell (5,6) and if it's a template 3, i get the information from Cel (9,4), but the problem is that my code will not be at all flexible.

I also said, I may do like a sheet called reference, in which I define the positions of each field based on the template model, for example I say that the first name is for template 1 at the position 2,3 for the template 2, first name is at 5,6 and for template3 it's at 9,4. Like the following picture, and when I loop through my 200 files, I check, if it's template 1 i read the sheet of reference and I know that the first name will be at this position, same for template 2 and so on....this solution looks like previous one, but more flexible, because all we have to change is the reference table if something changes, but I am wondering if it will be fast or slow if for each field i have to come read 2 cells in the reference sheet to know the position. Sheet Reference

I am really lost because I have to choose the best way to do what I want before start coding to avoid time wasting. If any expert can help by telling me what is best or giving me more ideas than what I thought about I will really appreciate.

Thanks in advance to any helper

EDIT: @PEH, what do you think about if I make my lookup table like that ? enter image description here

EDIT2: @PEH, that's what is suggested in last comment enter image description here

JustGreat
  • 551
  • 1
  • 11
  • 26
  • Peh, did you provide any answer ? I see that you wrote something but can't see what you wrote. – JustGreat Jan 23 '19 at 14:31
  • 2
    Look into the [Range.Find method](https://learn.microsoft.com/en-us/office/vba/api/excel.range.find). You can use this to search for "First name" and use the found cell without actually knowing its position beforehand. Then you'll just need to learn how to [Loop though files in a folder](https://stackoverflow.com/questions/10380312/loop-through-files-in-a-folder-using-vba). That should give you a place to start. – tigeravatar Jan 23 '19 at 14:37
  • 1
    @JustGreat no, just corrected your tags. But I was about to write the same comment as tigeravatar. • Your question is too broad to provide a answer because you did nothing yet. Give the method in the comment above a try. – Pᴇʜ Jan 23 '19 at 14:39
  • thanks to both...Looping through a folder I already know how to do so this is not the problem... As for the Range.Find, you mean that I check the headers of each data to get the values ? is it faster than telling I need the data at the position 1,2 let us say ? Also I am thinking about something. the range.find will give me the position of the header, so I should make an offset to get the value right ? and if yes, this mean that if one day, we add a column or line or a small cell betweek the headers and the values, I should review the whole code right ? – JustGreat Jan 23 '19 at 14:50
  • Is the name `Model 1` etc always in `Cells(1, 6)`? Then you could read this cell to decide which model it is and where the other data is expected. That would be faster than using `find()`. – Pᴇʜ Jan 23 '19 at 14:59
  • Thanks PEH, so you mean that if I know that for model 1, the first name is at position 2,3, for model 2, at position 5,6 etc.it's better that I make a kind of select case, case model 1, i read my 80 cells from where I know they are etc.so this is what I called in my ideas above (reading data based on columns and row indexes) can you check my last paragraph which is another idea of that kind but a little bit more flexible, instead of specifying in my code that the first name is at this or that position,I create a sheet where I say the position of each fields based on model.is it gonna be slow? – JustGreat Jan 23 '19 at 15:30
  • @JustGreat Have a look at my answer below. I think the `match` will still be faster than a `find`. To be even more faster you could cache the rows/columns data of the lookup table in an array to look it up there (which makes it a bit more complicated). Start with the idea below and if you need to improve speed go further caching it into an array. – Pᴇʜ Jan 23 '19 at 15:38
  • @Pᴇʜ can you please tell me more about your idea of caching the lookup into an array ? should I make an array of user defined type, something like Type LookupTable Field As String lRow As Integer lCol As Boolean End Type Dim myArray (LastRow) as LookupTable or a normal array like Dim myArray (LastRow, 3) as string and to fill the array, it's better to do something like MyArray =Range("A1", Range("C1048576").End(xlUp)) or to loop and fill it row by row ? and can I later apply the function match to the array ? Or i create a normal loop to find my items ? – JustGreat Jan 25 '19 at 10:30
  • @Pᴇʜ I am sorry in my above comment my code is on the same line, the site doesn't allow multi lines a it seems in comments, so if you can copy paste it to notepad, you will be able to read better my question about arrays – JustGreat Jan 25 '19 at 10:35
  • @JustGreat I made an edit to my answer. Please also read the comments below my answer. – Pᴇʜ Jan 25 '19 at 10:36
  • @Pᴇʜ thanks a lot, I saw that after I wrote the comment....By the way there is something that I can't understand on this site, someone put me -1 to my question ! what does mean a -1 on a question ? you ask a question, we note the question itself ? That makes no sens. There is nothing called a bad question, and the one who voted -1 could have try to help like you did instead of voting without beeing helpfull.thank you Peh again for your support and help. – JustGreat Jan 25 '19 at 11:08

1 Answers1

1

The basic idea (beside looping through your files):

  1. Change your lookup data into the following:

    enter image description here

  2. Then read Cells(1, 6) to get your model.

    Dim Model As String
    Model = Worksheets("MyTemplate").Cells(1, 6).Value
    
  3. Use the WorksheetFunction.Match method to find your field in the lookup table.

    Dim FieldRow As Long
    FieldRow = Application.WorksheetFunction.Match(Model & "-First name", Worksheets("LookupTable").Range("A:A"), 0)
    
  4. Use …

    fRow = Worksheets("LookupTable").Cells(FieldRow, 2)
    fColumn = Worksheets("LookupTable").Cells(FieldRow, 3)
    

    To get row and column where to look for that field in your template.

If you put the field lookup stuff into a handy function, the code would get easier to maintain. For example put the following into a module:

Option Explicit

Public LookupCache As Variant
Public LookupResults As Variant

Public Function ReadField(Ws As Worksheet, FieldName As String) As Variant
    'Here we cache the lookup table. It reads the sheet LookupTable into an 
    'array if the array does not exist yet. If the function runs a second time,
    'the array exists already and is used directly (saves time).
    'Lookup in arrays is much faster than in cells.
    'Caching makes this function about 2 times faster than without.
    If IsEmpty(LookupCache) Or IsEmpty(LookupResults) Then
        With ThisWorkbook.Worksheets("LookupTable")
            Dim LastLookupRow As Long
            LastLookupRow = .Cells(.Rows.Count, "A").End(xlUp).Row
            LookupCache = .Range("A2", "A" & LastLookupRow).Value
            LookupResults = .Range("B2", "C" & LastLookupRow).Value
        End With
    End If

    Dim ModelName As String
    ModelName = Ws.Cells(1, 6).Value

    Dim LookupRow As Long
    On Error Resume Next
    LookupRow = Application.WorksheetFunction.Match(ModelName & "-" & FieldName, LookupCache, 0)
    On Error GoTo 0

    If LookupRow = 0 Then
        'field not found
        ReadField = CVErr(xlErrNA)
        Exit Function
    End If

    Dim fRow As Long, fColumn As Long
    fRow = LookupResults(LookupRow, 1)
    fColumn = LookupResults(LookupRow, 2)

    ReadField = Ws.Cells(fRow, fColumn).Value
End Function

So you could read a field like

Debug.Print ReadField(MyLoopWorkbook.Worksheets("MyTemplate"), "First name")
'MyLoopWorkbook should be the current workbook in your files loop

Edit according to the comment …

If we added a new field Company to a new model4,a user must go to the sheet lookuptable and add Model4-Company at line 11 with the row and col, but also in the code he has to go and add ReadField(MyLoopWorkbook.Worksheets("MyNewTemplate"), "Company"), right? That's why I am not understanding how I can count only on persons who don't code to add that? Can you clarify please because what you said is really important.

If you make the ReadField part dynamic you don't need to code here too. For example if you want to end up with a table like that:

enter image description here

You would just add a new header in column 4 called like the field eg Company. And write a loop that loops throug the columns of that header row to collect all fields.

Sub ReadAllFields()
    Dim wsData As Worksheet
    Set wsData = Worksheets("CollectedData")

    Dim FreeRow As Long 'find next free row in table
    FreeRow = wsData.Cells(wsData.Rows.Count, "A").End(xlUp).Row + 1

    Dim Fields() As Variant 'read headers into array
    Fields = wsData.Range("A1", wsData.Cells(1, wsData.Columns.Count).End(xlToLeft)).Value

    Dim iCol As Long
    For iCol = 1 To UBound(Fields, 2) 'loop through header columns
        wsData.Cells(FreeRow, iCol).Value = ReadField(MyLoopWorkbook.Worksheets("MyNewTemplate"), Fields(1, iCol)) 
        'reads fields dynamically depending on which headers exist in data sheet
    Next iCol
End Sub
Pᴇʜ
  • 56,719
  • 10
  • 49
  • 73
  • Thanks a lot Peh, that's exactly what i was thinking about in my last idea when talking about making it something flexible. To make the match function faster, should I use something like .Range("A" & .Rows.Count).End(xlUp).Row instead of Range("A:A") ? Anyway, do you think that the code will not be very slow if i have to read 200 workbook and in each workbook there is between 60-80 fields to read with the ReadField function (number of fields depends on the template). what do you think ? – JustGreat Jan 25 '19 at 09:57
  • Of course you can try to limit your range instead of using the full column (but I never tested if there is a speed benefit at all, because I hope the lookup will do that limitation automatically too.) • I improved the function using an array to cache the table, so the lookups will be faster (see my edit). The previous issue isn't in count anymore, the cache only contains the data. • Of course 200 workbooks will still need their time. But I see no other option to get this even faster. Caching made that function about 2 times faster. – Pᴇʜ Jan 25 '19 at 10:56
  • Thanks a lot.So in your example, LookupCache will be an array representing the field names and LookupResults the positions. I am thinking about making 3 lookuptable, one for each model Simply because I have 80 fields,so to avoid scrolling a lot, and if I do so, my field column, will not be called model1-FirstName, it will be directly FirstName on each model. Anyway I am still wating users to send me the 200 files to try all that, once again thanks a lot Peh you are very helpfull and if you got more ideas don't hesitate to update the answer.I will tag it as answered once I finish work on it. – JustGreat Jan 25 '19 at 11:04
  • I didn't undestand Why do you suggest me not to split it into a lookupsheet for each model ? what I was imagining is making 3 lookup sheet one by model, and in my code, I do something like for each sheet in my wb, if sheetname = model X, then I read the corresponding model sheet, and if i want to add a new model one day, I add a new sheet called model 4 etc... so can you tell me please the disadvantage ? – JustGreat Jan 25 '19 at 16:02
  • @JustGreat But if you read the model sheet each time the model changes then there is no caching anymore. So you will need to read all models at once and keep them in cache anyway – Pᴇʜ Jan 26 '19 at 12:55
  • @Peh I see what you mean, in case I want still to do in separate sheets, I can keep the main lookup sheet as you did, and create a lookup sheet for each model, at the beginning of my code, I copy all lookup sheets into the main lookup sheet, and I work normally on the main lookup sheet so that I can use the caching the way you did – JustGreat Jan 26 '19 at 14:30
  • @JustGreat Well you can but I see no benefit in that. – Pᴇʜ Jan 27 '19 at 12:19
  • 2
    @JustGreat You forgot to mark PEH's answer as "Selected". – Variatus Jan 28 '19 at 01:22
  • @Pᴇʜ I will tag your solution as answered thanks a lot, on the same time Variatus provided me an answer when I was asking about arrays to pass to your function. Based on ur code, variatus proposed to use enum which is good to avoid 'find' and strings's concatanation.Any idea which one is more performant ?if I know that the info in excel are strings, I keep the array as variant? or i make it strings ? and CVErr(xlErrNA) will return N/A in case of error ?right?https://stackoverflow.com/questions/54383897/best-way-to-copy-ranges-from-different-sheets-and-store-them-into-an-array/54384878#54384878 – JustGreat Jan 29 '19 at 21:27
  • If you read a range into an array it has to be variant. And yes `CVErr(xlErrNA)` will return the error `N/A`. – Pᴇʜ Jan 30 '19 at 07:06
  • @Pᴇʜ thanks again for your efforts, you are a very good helper and I learned many things from your answer above. by the way, did you check the solution proposed by variatus ? what do you think about using the enum, isn't it better than FIND methode ? do you suggest me to use the find as in your example or to go for the enum which seems good as well ? – JustGreat Jan 30 '19 at 09:24
  • Using enum has one big disadvantage: If you need to add fields or templates you need to start coding again. That means someone who is able to code (=expensive person) needs to work for this. If you seperate your data (like fields name etc) into a sheet *everyone* (who reads the hopefully existing explanation about the sheet) can add fields (without knowledge of coding). I recommend always to seperate code logic and data, and never hardcode data into code (that's best practice). By the way I didn't use `Find` I used `Match`. – Pᴇʜ Jan 30 '19 at 09:39
  • @Pᴇʜ I totally agree, and it's exactly what I want, after I finish the project that I am working on, someone who is not programmer can maintains it to add new model and fields. The problem is that I am not understanding honestly why with enum I will need someone to code and how in your proposed solution I will not need someone to code later on. Maybe I missunderstand something but in the coming comment I will explain – JustGreat Jan 30 '19 at 10:59
  • @Pᴇʜ to complete my idea, and why I don't understand, In the solution right above, I should call the function that way: ReadField(MyLoopWorkbook.Worksheets("MyTemplate"),"First name"). If we added a new field Company to a new model4,a user must go to the sheet lookuptable and add Model4-Company at line 11 with the row and col, but also in the code he has to go and add ReadField(MyLoopWorkbook.Worksheets("MyNewTemplate"), "Company"), right?That's why I am not understanding how I can count only on persons who don't code to add that?Can you clarify please because what you said is really important – JustGreat Jan 30 '19 at 11:07
  • @JustGreat just make reading fields dynamic too. See my edit. – Pᴇʜ Jan 30 '19 at 11:22
  • @Pᴇʜ that's a good addition.I understand now, that with enum, If i want to add something I should go to the code and add it to the enum,while here any user can just have to add it to Excel(->Flexible). I am thinking about changing the lookup table, can you check my edited pic above? what do you think ? I said that this way i will have row, col and range, in case I need in my code I don't have to calculate,all will be in my lookup, also visually it's easiar for a user to fill it,instead of filling template-field and to avoid string concatenation, what do you think ? – JustGreat Jan 30 '19 at 14:43
  • @JustGreat No good idea. As I already told you this will mess up your code a lot. Keep it in one place as in my example. Same data has to be in the same position (thats how you do it). • For better usability you could auto generate the the lookup-value out of a `Model` and `Field` column by using the formula `=A:A & "-" & B:B` (where A is the Model and B the Field column) so the user can enter it more easily and it concatenates automatically. Note that you don't need both `rng` and `row/column` decide for one of it and adjust the code accordingly. Don't use both at the same time! – Pᴇʜ Jan 30 '19 at 14:54
  • @Pᴇʜ Thanks. Auto generate the lookup-value, is a good idea. I will do that. What about if I autogenerate the lookup that you suggested, based on the lookup I suggested ? I mean the user will fill a sheet that looks like the pic I proposed and with formulas excel will autofill the lookup that you suggested and the code will work on that ? any inconvinience if I do it that way ? Regarding that I have to choose if I want to work on ranges or ROW/COL I think that ROW/COL is better except if you have opinion. Thanks again those are my last questions about this subject – JustGreat Jan 30 '19 at 15:38
  • 1
    It will be the same mess to convert that table from your format into mine and there will be no great advantage just disadvantages of converting and non-consistent data. Just in case if you ask again I will still tell you "don't do that". • In `Rng` vs `row/col` the range might be a slightly faster because you only need to handle 1 variable instead of 2. And ranges might be more convenient for the user. Anyway both will work and there will be no big difference. – Pᴇʜ Jan 30 '19 at 15:48
  • @Pᴇʜ can you check last Edit ? is that what do you suggest ?I can invert the place of col and row to fit your code, but I wanted to make sure that I got it well this time..By the way, If Ranges are faster in this case I will use Ranges, I thought that Row Col is better when in code you wanna reference a cell. if range are faster I will directly do ranges – JustGreat Jan 30 '19 at 16:04
  • 1
    @JustGreat almost it should be `Model - Field` to work with my code. And Excel formulas calculate faster from left to right so the best column order would be `Template`, `Field` and then `Autofill`. – Pᴇʜ Jan 30 '19 at 16:24
  • @Pᴇʜ I never knew that the order may affect the performance! is it normal ? anyway I changed the order as suggested but in this case I should make Your lookupcache start at C, and your lookup LookupResults at D. I can also replace the R C by ranges like "A1" etc...to make it faster (plz check pic back now) – JustGreat Jan 30 '19 at 16:40
  • I just discovered something that Will cause me problem...i may have in some of the templates comboboxes....i should thibk about solving that :( – JustGreat Jan 30 '19 at 17:52
  • 1
    @JustGreat the performance difference between ranges and row/column is minimal, because you read it only once into the array cache and so you can ignore this 1/10 second you could win here. Use the one that suits better to you and your users. • Calculation order in Excel is from left/top to right/bottom, so if you calculate in `A1 = B1 + C1` (and B1 and C1 are also formulas) then Excel starts with A1 then calculates B1 which triggers a change in A1 again, then calculates C1 which triggers again a change in A1. So it has to calculate A1 multiple times. So it's better to calc `C1 = A1 + B1`. – Pᴇʜ Jan 31 '19 at 07:03
  • @Pᴇʜ thanks a lot, you teached me lots of things and now I understand why the order is important. I thought about how to solve the presence of controls checkbox or combobox in some sheets template and I will do something like a new procedure that execute after yours to treat only controls : For Each obj In Sheets(1).OLEObjects ... check the type using if TypeName(obj.Object) = "CheckBox" and get the values and store them . The only thing I never understood why we need to do Obj.Object and not directly Obj to get the type of active x controls :) – JustGreat Jan 31 '19 at 10:49
  • 1
    @JustGreat That's just how it is designed `Obj.Object` ask Microsoft why they did it that way ;) – Pᴇʜ Jan 31 '19 at 11:09
  • YES :), but they design it as it seems like if an oleobject itself doesn't have too much properties, but one of the few properties is an OBJECT and inside this OBJECT we find our real properties. like of oleobject is a canvas and real object is inside the OBJECT; Anyway I will do that to get the controls – JustGreat Jan 31 '19 at 11:28
  • 1
    @Pᴇʜ Just faced an error with the ReadField function, I added Byval to make it work, if you call it from **ReadAllFields** without adding byval you will get an error that you can't pass ByRef value to it. so i maded it like that `ReadField(Ws As Worksheet, ByVal FieldName As String) ` – JustGreat Feb 05 '19 at 15:31