I have an Excel macro code to extract unique mutations from GISAID metadata that involves:
- Trimming the "(" in the very beginning and the ")" in the very end of each value and auto-filling the trim formula down until the last row.
- Pasting (values only the trimmed data into a new sheet) and splitting the comma-delimited values.
- Stacking all the multi-columned rows into one column.
- Deleting all blank cells and shifting the subsequent cells up (if any blank cells are present).
- Removing duplicates.
This is the code that I've managed to build (I'm really really new in VBA, I've only started automating Excel processes because I'm working with GISAID data almost every day.) Users can paste the data from GISAID's .tsv metadata to A1 and just run the macro.
Sub MUTATIONS_MACRO()
'
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
' MUTATIONS_MACRO_EXCEL_1 Macro
'
'
Range("B1").Select
Dim Lr As Long
Lr = Cells(Rows.Count, "A").End(xlUp).Row
Range("B1:B" & Lr).Formula = "=RIGHT((LEFT(RC[-1], LEN(RC[-1])-1)), LEN(LEFT(RC[-1], LEN(RC[-1])-1))-1)"
Range("B1").Select
Range(Selection, Selection.End(xlDown)).Select
Selection.Copy
Sheets.Add After:=ActiveSheet
Range("A1").PasteSpecial Paste:=xlPasteValues
Range("A1").Select
Range(Selection, Selection.End(xlDown)).Select
Selection.TextToColumns _
Destination:=Range("A1"), DataType:=xlDelimited, _
ConsecutiveDelimiter:=False, Comma:=True
ActiveCell.Rows("1:1").EntireRow.Select
Range(Selection, Selection.End(xlDown)).Select
Selection.SpecialCells(xlCellTypeBlanks).Select
Selection.Delete Shift:=xlUp
Range("A1").Select
Range(Selection, Selection.End(xlToRight)).Select
Range(Selection, Selection.End(xlDown)).Select
Dim vaCells As Variant
Dim vOutput() As Variant
Dim i As Long, j As Long
Dim lRow As Long
If TypeName(Selection) = "Range" Then
If Selection.Count > 1 Then
If Selection.Count <= Selection.Parent.Rows.Count Then
vaCells = Selection.Value
ReDim vOutput(1 To UBound(vaCells, 1) * UBound(vaCells, 2), 1 To 1)
For j = LBound(vaCells, 2) To UBound(vaCells, 2)
For i = LBound(vaCells, 1) To UBound(vaCells, 1)
If Len(vaCells(i, j)) > 0 Then
lRow = lRow + 1
vOutput(lRow, 1) = vaCells(i, j)
End If
Next i
Next j
Selection.ClearContents
Selection.Cells(1).Resize(lRow).Value = vOutput
End If
End If
End If
Range("A1").Select
Range(Selection, Selection.End(xlDown)).Select
ActiveSheet.Range("A:A").RemoveDuplicates Columns:=1, Header:=xlNo
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
This works perfectly for up to 1000 rows of data but if I start pasting more than 1100-ish rows onto column A, it starts to run weird and gives me results that are not in a single column. I'm not sure why it's running differently if the processes are exactly the same. Can anyone help? Thank you so much!