Optimization of an Algo

Question

I have code that reads a text file and creates an input array[] of boolean type. Its size is about 100,000-300,000 items. Now the problem I'm facing is to create all those subsets of size N, 3>=N>=9, that have contiguous true values.

E.g. for N=3, [true][true][true] is the required subset if all the 3 trues are in the continuous indexes.

Although I have an algorithm created, it's very slow. I need a better solution, which is fast and efficient.

Please suggest some ideas.

 public static void createConsecutivePassingDays()
    {       
        for (String siteName  : sitesToBeTestedList.keySet())
        {
            System.out.println("\n*****************Processing for Site--->"+siteName+" ***********************");

            LinkedHashMap<String,ArrayList<String>> cellsWithPassedTripletsDates=new LinkedHashMap<String, ArrayList<String>>();

            for (String cellName : sitesToBeTestedList.get(siteName))
            {

                System.out.println("\n*****************Processing for Cell--->"+cellName+" ***********************");

                boolean failed=false;

                ArrayList<String> passedDatesTriplets=new ArrayList<String>();
                int consecutiveDays=0;
                String tripletDate="";
                String prevDate_day="";
                String today_Date="";

                for (String date : cellDateKpiMetOrNotMap.get(cellName).keySet())
                {
                    System.out.println("\nprocessing for Date-->"+date);
                    if(!(prevDate_day.trim().equals("")))
                        today_Date=getNextDay(prevDate_day.substring(0, prevDate_day.lastIndexOf('_')));

                    if(Connection.props.getProperty("INCLUDE_WEEKENDS").equalsIgnoreCase("FALSE"))
                    {
                        if(date.endsWith("SAT") || date.endsWith("SUN") || (!(date.substring(0, date.lastIndexOf('_')).equalsIgnoreCase(today_Date))))
                        {
                            if(consecutiveDays >= Reader.days)
                            {
                                passedDatesTriplets.add(tripletDate);
                            }

                            tripletDate="";
                            consecutiveDays=0;
                            prevDate_day=date;
                            continue;
                        }
                    }


                    if(cellDateKpiMetOrNotMap.get(cellName).get(date).equalsIgnoreCase("TRUE"))
                    {

                        if(tripletDate.equals(""))
                            tripletDate=date;
                        else
                            tripletDate+="#"+date;

                        consecutiveDays++;

                    }
                    else
                    {
                        failed=true;
                        if(consecutiveDays >= Reader.days)//kd
                        {
                            System.out.println("Triplet to be added-->"+tripletDate);
                            passedDatesTriplets.add(tripletDate);
                        }
                        tripletDate="";
                        consecutiveDays=0;
                    }

                    prevDate_day=date;
                }

                if(!failed)
                    passedDatesTriplets.add(tripletDate);
                else
                {
                    if(tripletDate.trim().split("#").length >= Reader.days)
                    {
                        passedDatesTriplets.add(tripletDate);
                    }
                }

                cellsWithPassedTripletsDates.put(cellName, passedDatesTriplets);

            }

            siteItsCellsWithPassedDates.put(siteName, cellsWithPassedTripletsDates);

        }

        System.out.println("\n************************************************SITES***************************************");
        for (String site : siteItsCellsWithPassedDates.keySet())
        {
            System.out.println("\n********************Site="+site+" ***********************");
            for (String cellName : siteItsCellsWithPassedDates.get(site).keySet())
            {
                System.out.println("\nCellName="+cellName);
                System.out.println(siteItsCellsWithPassedDates.get(site).get(cellName));
            }
            System.out.println("***********************************************************");
        }
        System.out.println("********************************************************************************************");
    }

hey, the code is up..,, But the question I have asked is actually very basic idea that is working behind the code, but the code is too complex to be shown, as many other functionality is embedded into it. — KDjava, Oct 10 '12 at 10:32
How does the Algorithm and its datastructures map to the code? You talked about an array of booleans, but see tons of Strings and Lists in the code ... — Jens Schauder, Oct 10 '12 at 10:35
Do the subsets overlap, ie if you have t t t t, is that two subsets of 3 trues, or one subset of 3 trues and one extra true which is not part of any subset? — hyde, Oct 10 '12 at 10:39
@JensSchauder, don't try n map the code to problem, the problem I have told you is self-sufficient, It was just given as MyNameIsTooCommon, was asking for it.. Also man ArrayLists too are same as arrays only(self-growing array).. pls suggest me a solution I don't need any code for it, pls — KDjava, Oct 10 '12 at 10:42
@hyde , Yes absolutely the subSets overlap, t t t t, is two subsets of 3 trues as--> 0,1,2 and 1,2,3 indexes ..!! — KDjava, Oct 10 '12 at 10:44
@KDjava You have a performance problem. This might be due to the code you are using, or due to the algorithm. In the code you posted you do tons of things not related to the algorithm you ask for. It's rather likely that these cause your performance problem. — Jens Schauder, Oct 10 '12 at 11:03

score 4 · Accepted Answer · edited May 23 '17 at 11:43

First I would stay away from the array[boolean] a BitSet is more memory efficient in I'd expect it to be faster as well in your case. Since it will utilize the caches better. See boolean[] vs. BitSet: Which is more efficient?

For the algorithm:

Iterate through the datastructure. When you come across the first true, remember its position (start) until you reach a false. This is the position end At that point you have the start and end of a contiuous interval of true values, which is basically your result. You get your subsets as starting from start to end - n.

Repeat until the end of you datastructure

You can even parallize this by starting n-processes, each handling a different part of the array, starting with the first false value after the start of the segement and continuing over the end of the segement until the first false.

score 1 · Answer 2 · answered Oct 10 '12 at 11:13

The simplest algo would be to check the N values starting at index x. if there is at least one false, then you can go directly to the index x+N. otherwise you can check index x+1; if there is no valid sequence, then you will check size/N cells.

in pseudo-code :

int max = array.length - N;
int index = 0;
boolean valid = true;
while (index < max) {
   valid = true;
   for (check = index; check<index+N; check++){
      valid = valid && array[check];
   }
   if (valid) {
      // you got a continous sequence of true of size N
      ;
      index++;
   } else {
      index = index + N;
   }      
}

also, with a BitSet instead of an array you could use the nextClearByte to get the index of the next false. the difference with the previous false minus N indicate the nomber of sequences of N true (with the previous false initially valued at -1).

score 0 · Answer 3 · answered Oct 10 '12 at 10:33

0

I will sugggest you to create a stringbuilder and append 1 for every "true" value added to the boolean array and a 0 for every "false" added. Thus your stringbuilder will have a sequence of 1s and 0s. Then just use indexOf("111") to get the starting index of the three contiguous "true" values, it will be the starting index in the stringbuilder and in your boolean array as well.

answered Oct 10 '12 at 10:33

Victor Mukherjee

10,487
16
54
97

ya that sounds great but I don't think that would give me the overlapping trues subsets correctly, For eg, [true][true][true][true], should actually give you 2 subsets of true with indexes as 0,1,2 and 1,2,3. – KDjava Oct 10 '12 at 10:47
You can solve this if you start the next search with the last found startindex +1 – JackTools.Net Oct 10 '12 at 11:06

Optimization of an Algo

3 Answers3