2

Current Issue:

Hey everyone, appreciate any help here as I'm still beginning my journey in coding.

I'm trying to see if I can make a script that will:

  1. Look for duplicates (in column D), and
  2. delete any data from the following duplicates after the 1st match in columns E-L (see desired outcome if that doesn't make sense verbally).
  3. The script would need to use the column header names (ex. "snacks") instead of hard-coded column references

*So for example, the script finds ABC001, deletes only the duplicates for ABC001 in the corresponding columns then moves on to ABC004 and performs the same action.

I'm not sure how to write a script that would do this, and keep going to find duplicates after the 1st set is found. I think I know how to do a for loop now, but it's not clear to me how to make it do a search loop and stop after it find the first match and keep going.

Current Data: enter image description here

Desired Outcome: enter image description here

Code so far below. I think I would need to incorporate something like JSmith showed in this example? Or would I need to incorporate some form of .length with the duplicate range in a for statement so that it can find the duplicates, get the # of them, and then only perform the action on everything past the 1st instance?

function duplicateRemoval() {
  ss = SpreadsheetApp.getActive().getSheetByName('Sheet1');//gets sheet by name
  const [aB,...cd] = ss.getDataRange().getValues();//literal assignment that assigns aB to the header array and the rest of the data to 'cd'

  let column = {}
  let iData = {};//index into the row array for each column header
  aB.forEach((a,i)=>{column[a] = i+1;iData[a]=i});//building column and iData so that headers can move anywhere
  
}//let & forEach & const const [aB,...cd] derived from (https://stackoverflow.com/questions/70101896/search-column-for-text-and-use-array-list-to-insert-text-in-another-cell) @Cooper

Raw Data:

Name Owner Snack Transaction # # of snacks requested #2 #3 #4 #5 #6 #7 #8
Bill Example Snacktown celery ABC001 4 1 2 3 4 5 6 4
Bill Example Snacktown celery ABC001 4 1 2 3 4 5 6 4
Bill Example Snacktown celery ABC001 4 1 2 3 4 5 6 4
Jane Doe Snacktown chips ABC002 1 1 1 1 1 1 1 1
Jane Doe Chipworld chips ABC003 1 1 1 1 1 1 1 1
Jane Doe Chipworld chips ABC004 5 5 1 1 1 1 1 5
Jane Doe Chipworld chips ABC004 5 5 1 1 1 1 1 5
Jane Doe Chipworld chips ABC004 5 5 1 1 1 1 1 5
Jane Doe Chipworld chips ABC004 5 5 1 1 1 1 1 5

Sources:

google app script array delete duplicate value from top

Google Script App Delete Duplicate Rows with a Specific Value in Specific Column in Google Sheet

How do I find and delete duplicate values in a range of cells while keeping the first occurrence of a duplicated value in Google Sheets?

User1938985
  • 127
  • 8
  • Wow, that's the very good formatted question! A really rare thing these days. "_Cooper showed in this example_" not Cooper, or not that example. As for the question, it look like it's need just a couple of basic tricks. I believe you will get the answer in no time. – Yuri Khristich Dec 03 '21 at 18:33
  • Thanks! And yes you were right I had the wrong reference name, updated it – User1938985 Dec 03 '21 at 18:50

3 Answers3

2

It is unclear why you want to use a script here, as this seems doable with a plain vanilla spreadsheet formula. It is also unclear whether you really need to repeat the values in A2:D many times with nothing in columns E2:L.

To remove duplicate rows, and get just one copy of each unique transaction, choose Insert > Sheet and put this spreadsheet formula in cell A1:

=unique(Sheet1!A2:L)

To get the expected result you show, including rows that are mostly blank, use this:

=arrayformula( 
  { 
    Sheet1!A2:D, 
    array_constrain( 
      if( 
        Sheet1!D2:D <> Sheet1!D1:D, 
        Sheet1!E2:L, 
        iferror(1/0) 
      ), 
      rows(Sheet1!E2:L), columns(Sheet1!E2:L) 
    ) 
  } 
)

To determine row uniqueness based on all columns A2:D instead of just the transaction ID in column D2:D, replace the if() condition with A2:A & B2:B & C2:C & D2:D <> A1:A & B1:B & C1:C & D1:D, inserting the proper sheet reference.

doubleunary
  • 13,842
  • 3
  • 18
  • 51
  • 1
    I'd like to use a script as I'd have a lot of rows (hundreds+) to review, and it would help to get some of the data cleaned while avoiding any human error. In terms of having some data remain (A2:D) I'd want this for review down the road, but the data in E2:L would impact charts or pivot tables I'd need to make. Appreciate the input/help – User1938985 Dec 03 '21 at 18:55
  • The formulas I gave you are array formulas that will process _all_ the rows automatically. They will easily manage thousands of rows. – doubleunary Dec 04 '21 at 06:09
2

If you need a script you can try this:

function main() {
  var ss = SpreadsheetApp.getActiveSpreadsheet()
  var sheet = ss.getActiveSheet();
  var range = sheet.getDataRange();

  // get all data from the sheet
  var data = range.getValues();

  // get column headers
  var headers = data.shift();

  // get the list of transactions
  var transactions = data.map(x => x[headers.indexOf('Transaction #')]);

  // loop through all the transactions
  for (let transaction of transactions) {

    // get indexes of rows to process
    var rows = transactions.map((t, row) => t === transaction ? row : '' ).filter(String).slice(1);

    // process the rows
    for (let r of rows) {
      data[r][headers.indexOf('# of snacks requested')] = '';
      data[r][headers.indexOf('#2')] = '';
      data[r][headers.indexOf('#3')] = '';
      data[r][headers.indexOf('#4')] = '';
      data[r][headers.indexOf('#5')] = '';
      data[r][headers.indexOf('#6')] = '';
      data[r][headers.indexOf('#7')] = '';
      data[r][headers.indexOf('#8')] = '';
    }
  }
  
  // put the updated data back to the sheet
  range.setValues([headers, ...data]);
}

Update

Here is the improved variant of the same code. It still loops through all the rows, but it skips already processed transactions:

function main() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getActiveSheet();
  var range = sheet.getDataRange();
  var [headers, ...data] = range.getValues();
  
  var transactions = data.map(x => x[headers.indexOf('Transaction #')]);
  
  var cols_to_clean = ['# of snacks requested','#2','#3','#4','#5','#6','#7','#8'];

  var processed_transactions = [];

  for (let transaction of transactions) {

    // skip already processed transactions
    if (processed_transactions.includes(transaction)) continue; 

    var rows_to_clean = transactions.map((t, row) => t === transaction ? row : '' )
      .filter(String).slice(1);

    for (let r of rows_to_clean) {
      cols_to_clean.forEach(c => data[r][headers.indexOf(c)] = '');
    }

    processed_transactions.push(transaction);
  }
  
  range.setValues([headers, ...data]);
}

Thanks to @TheMaster for the noted deficiencies.

Update 2

Sorry for spamming, just figured out the final solution that has no redundant iterations (I hope):

function main() {
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getActiveSheet();
  var range = sheet.getDataRange();
  var [headers, ...data] = range.getValues();
  var cols_to_clean = ['# of snacks requested','#2','#3','#4','#5','#6','#7','#8'];
  
  // get all transactions (9 items for this data)
  var all_transactions = data.map(x => x[headers.indexOf('Transaction #')]);

  // get the short list of unique transaction (4 items for this data)
  var uniq_transactions = [... new Set(all_transactions)];

  for (let transaction of uniq_transactions) {

    // get all indexes of rows with given transaction
    var rows_to_clean = all_transactions.map((t, row) => t === transaction ? row : '')
      .filter(String).slice(1);

    // clean the rows
    for (let r of rows_to_clean) {
      cols_to_clean.forEach(c => data[r][headers.indexOf(c)] = '');
    }

  }
  range.setValues([headers, ...data]);
}

I didn't remove my first update, I think this can be useful for educational purposes.

Yuri Khristich
  • 13,448
  • 2
  • 8
  • 23
  • wow this is great, it worked like a charm, thank you Yuri. Out of curirousity, what does `.filter(String)` do? And `.slice()` I don't think I've seen `slice` before! – User1938985 Dec 03 '21 at 19:11
  • 1
    `.filter(String)` is the trick to remove all empty elements from an array. `.slice(1)` the trick to get all elements of an array after the element[0]: `[0,1,2,3].slice(1)` --> `[1,2,3]`. In this case we're getting all indexes and then remove first index, since we don't need to process the first row. Sorry for my sloppy English. – Yuri Khristich Dec 03 '21 at 19:21
  • 1
    `transactions` is a list of all transactions including duplicates. Right? So, the outer loop would iterate 9 times instead of 4 times. – TheMaster Dec 03 '21 at 22:17
  • 1
    @TheMaster yes, you're right, it can be improved. I just added the updated variant. The outer loop still do 9 steps, but now it skips already processed transactions. But your solution is more efficient, I think. Except my code handles unsorted transactions fine (not sure, if it makes sense, though) – Yuri Khristich Dec 04 '21 at 06:45
  • 1
    @TheMaster and I've added the final solution with no redundant iterations. – Yuri Khristich Dec 04 '21 at 07:33
2

Assuming transaction ids are always grouped, iterate through rows and delete all specified columns where previous transactionId is equal to current transactionId.

function duplicateRemovalOfColsToRemove() {
  const transactionsHeader = 'Transaction #',
    colsToRemoveHeaders = ['# of snacks requested', '#2'],//add column headers as necessary
    ss = SpreadsheetApp.getActive().getSheetByName('Sheet1'), //gets sheet by name
    range = ss.getDataRange(),
    [headers, ...values] = range.getValues(),
    colsToRemove = colsToRemoveHeaders.map((h) => headers.indexOf(h)),
    transactionsIdx = headers.indexOf(transactionsHeader);
  let currTransaction = '';
  values.forEach((row) =>
    row[transactionsIdx] === currTransaction
      ? colsToRemove.forEach((idx) => (row[idx] = ''))
      : (currTransaction = row[transactionsIdx])
  );
  range.setValues([headers, ...values]);
}
TheMaster
  • 45,448
  • 6
  • 62
  • 85