The most optimal algorithm for processing such sequences is the division of the task into two passes: during the first pass, sequences are extracted from the entire original data set and information about them is stored in an auxiliary array; during the second pass, actions are performed for each series.
The auxiliary array can be of any structure. For example, it can store information about the beginning of the sequence (the row number in which the sequence began) and its length (or the row number where the sequence was interrupted). This storage method can greatly save memory, which is important when processing very large sets of source data, but greatly complicates the first pass algorithm.
If the available resources allow, then an array of the same dimension as the original dataset is used as an auxiliary array.
Depending on the type of sequence and the problem to be solved, the type of the auxiliary array is selected. For example, if you just need to select (highlight, colorize) sequences in the original dataset using conditional formatting, then a boolean array is enough, in which each element simply shows "belongs / does not belong to the series".
Integer arrays are used for tasks like the one described in the question. Each element will contain either 0 (does not belong to the series) or the ordinal number of the element in the series. For the example from the question, the code
aTemp(<first element>) = IIf(<condition aData(<first element>) NOT in sequence>, 0, 1) For i = <second element of aData> To <last element of aData> If <condition aData(<element i>) NOT in sequence> Then aTemp(i) = 0 Else aTemp(i) = aTemp(i - 1) + 1 End If Next i
will form such an array of "indices"
Data Index 0 0 0 0 0 0 2 1 5 2 9 3 6 4 0 0 1 1 0 0 5 1 8 2 0 0 0 0
For the described task,
<condition aData(<element i>) NOT in sequence> is just a comparison with zero. In other cases, the condition can be more complex, including the analysis of the values of adjacent cells. But the principle remains the same - integer values are accumulated in the Index array.
The second pass is done in reverse order, from bottom to top. If we carefully study the example of indexes, it becomes clear that only those source values for which there is an index greater than one are subject to processing - for index 0, the result value will be the same as in the original array (that is, also 0), for index 1 (one-element sequence) is exactly the same.
When moving from bottom to top, we will first stumble upon the maximum values of the indices (2 for the last sequence, 4 for the first sequence). This allows you to shorten the lookup by simply jumping over the next sequence, skipping the number of items found in the Index.
To organize such a pass, we cannot use the FOR loop, we will use the WHILE loop
i = <last index> While i >= <first index> If aTemp(i) < 2 Then i = i - 1 ' Jusm move up to one element Else ' We know i (number of row with end of sequence) ' and aTemp(i) (count of elements in this sequence). ' So, we can calculate start of sequence as i - aTemp(i) + 1 <your code to process one sequence, aData array values from i - aTemp(i) + 1 to i> i = i - aTemp(i) - 1 End If Wend
Pay attention to the line
i = i - aTemp(i) - 1 - we jump up the entire length of the sequence and one more line higher. In other words, since we know that the element before the current series is not a series (this is one of the series separators), then we skip it without processing.
The final decision can be formalized in the form of a UDF. Accept the source data as a worksheet range rather than an array of cell values. This technique allows you to analyze not only the data, but also their design, for example, the color of the background or font. This is convenient when the sequences in the original range have already been marked up manually (or by some program) and information about the sequences is contained in the design of the cells. In this case, this is not the case, so one of the first operators of the function will be to get values from a range into an array.
Putting it all together, the code for solving the sample problem from the question might look like this:
Function getMaxInSeries(aDataRange As Range) As Variant Dim aData As Variant Dim lB As Long, uB As Long Dim aTemp() As Integer Dim i As Long, j As Long Dim vMax As Variant aData = aDataRange.Resize(, 1).Value lB = LBound(aData) uB = UBound(aData) ReDim aTemp(lB To uB) aTemp(lB) = IIf(aData(lB, 1) = 0, 0, 1) For i = lB + 1 To uB If aData(i, 1) = 0 Then aTemp(i) = 0 Else aTemp(i) = aTemp(i - 1) + 1 End If Next i i = uB While i >= lB If aTemp(i) < 2 Then i = i - 1 Else vMax = aData(i, 1) For j = i - aTemp(i) + 1 To i - 1 ' No need compare with last element, we start with this value If vMax < aData(j, 1) Then vMax = aData(j, 1) Next j For j = i - aTemp(i) + 1 To i aData(j, 1) = vMax Next j i = i - aTemp(i) - 1 End If Wend getMaxInSeries = aData End Function