I have a 13 column csv file I am trying to expand in a particular way. My columns are:
firstName, firstName2, lastName, lastName2, location1, location2, location3, location4, email, email2, phone, phone2, phone3
The data is not perfectly clean (as one can imagine when it comes to people's names) and I want to expand the data for each entry into a large number of possible combinations of the data. Not exactly every possible combination of the data, but close to it. Not every entry has all the data (in fact I don't think any of the rows contain data for every column).
Is the best way to do this really a super-nested multi-branch structure? Essentially right now I'm starting with col1 and testing for a value, then testing for col2, adding each combination to a list of dictionaries and then appending these values to the master list of dictionaries.
For example on dictionary1
- firstName, lastName, location1, email, phone
- firstName, lastName, location1, email, phone2
- ...
- firstName, lastName, location2, email, phone
- ...
- firstNamefirstName2(combined), lastName, location, email, phone
Each row will become like 36 rows (honestly don't know, never been very good at combinatorial math, especially with conditionals).
Is there anything I can use to make this more straightforward? A library or something?
Update: The actual combinatorial algorithm is this:
fn ln loc email phones
fn lnln2 loc email phones
fn ln2 loc email phones
fnfn2 ln loc email phones
fnfn2 lnln2 loc email phones
fnfn2 ln2 loc email phones
fn2 ln loc email phones
fn2 lnln2 loc email phones
fn2 ln2 loc email phones
Where 4 locations, 2 emails, and 3 phones expand.
And I don't want redundancy of empty values. I figure that would be easier to just delete duplications after the csv file is made (that's simple in Excel).