1

Each column represents a different variable for a large set of data. I am trying to extract each number and place it in an array for each row.

Underscores represent spacing

2___2___2_______3___1___19

1___3___2_______3___3___19

1___3___4_______3___1___19

6___3___6_______5_______13

5___2___5_______5_______13

5___4___4___7___4_______13

spaceForNew Represents how many characters are left until the next variable will be found. This varies from the current variable.

I am using the following code:

    public static int[] remaining(String Line)throws IOException
{
    int[] data = new int[7];
    int pointer = 0;
    int spaceForNew = 0;
    for(int i = 0;i<=Line.length()-1;i++)
    {
        if(i<Line.length()-1)
        {
            if((i == spaceForNew)&&(pointer<6))
            {
                //two digit
                if((Line.charAt(i)=='1')&&(Line.charAt(i+1)=='0'))
                {
                    data[pointer] = 10;
                    spaceForNew+=3;
                    pointer++;
                //one digit
                }else if((Line.charAt(i)!= '    ')&&(Line.charAt(i+1)!='0')){
                    data[pointer] = Integer.parseInt(Character.toString(Line.charAt(i)));
                    spaceForNew+=2;
                    pointer++;
                }else if((Line.charAt(i)==' ')&&(data[pointer]==0)){
                    data[pointer]=-1;
                    spaceForNew++;
                    pointer++;
                }

            }
        }else {
            if(pointer==6)
            {
                data[pointer]=Integer.parseInt(Character.toString(Line.charAt(i)));
            }
        }
    }
    return data;
}

Following code is hideous and not very intuitive but seems to work for a lot of the data but fails in a fashion that appears to almost be random. Any suggestions at all would be much obliged

MC Emperor
  • 22,334
  • 15
  • 80
  • 130
  • Can you give an example of a line of input that doesn't work? – azurefrog Jun 05 '19 at 21:46
  • //one digit: `}else if((Line.charAt(i)!= ' ')&&` does not compile. Should that be a test for a space character? (Edit: seems the auto format just wipes the multiple spaces for me) – second Jun 05 '19 at 21:52
  • @second It's supposed to be a tabstop, but the copy/paste on SO doesn't pick it up properly unless you edit the question to look at the raw markdown. – azurefrog Jun 05 '19 at 21:59
  • Does that imply that your spacing between columns is variable? Or is it suppsed to be fixed at either 3 spaces or a tab? – second Jun 05 '19 at 22:01
  • @Daniel. You could just delete the comment in this case. Anyway please update the question about how tabs or numbers > 10 are related to your problem. The code shouldn't be able to process them. Edit: Update your question instead of putting things into the comments. – second Jun 05 '19 at 22:11
  • Sorry. My comment was posted on accident. "10___6___10___3___8______1" and "2___10___0___8___4___0___1" both result in data[0] = -1 while "9___10___10______6___0___8" results in data[0] = 9, like it should. Blank spaces are "null" or empty so i use -1 to represent them. – Daniel Gonzalez Jun 05 '19 at 22:15
  • You should follow the Java Naming Conventions: variable names are written in camelCase, i.e. `Line` should be `line`. – MC Emperor Jun 05 '19 at 22:41

4 Answers4

0

UPD Try this

    String line = "10   8   10           1   8";
    String[] split = line.split("   ");
    int[] array = new int[7];
    for (int i = 0; i < split.length; i++) {
        array[i] = split[i].trim().isEmpty() ? -1 : Integer.parseInt(split[i].trim());
    }
Egor
  • 1,334
  • 8
  • 22
0

you can use regex to parse the lines (\d+| )(?: )?
this basically says give me all digits or a single space that is followed or not by 3 spaces. you will get a list of strings that are either able to parse into numbers or is a single space and you can handle that as missing data but will be a place holder so you can keep your columns straight.

    Integer[] parsed = new Integer[7];
    String thing = "2   2   2       3   1   19";
    Pattern pattern = Pattern.compile("(\\d+| )(?:   )?");
    Matcher m = pattern.matcher(thing);
    int index = 0;
    while (m.find()) {
        if (!" ".equals(m.group(1)))
            parsed[index] = Integer.parseInt(m.group(1));
        else
            parsed[index] = -1; //or what ever your missing data value should be.
        index++;
    }
    Arrays.asList(parsed).forEach(System.out::println);

edit*** super fixed. group(0) is the whole pattern and then comes any capturing groups. so group(1) gets the first capture group which is just the digits or a single space.

mavriksc
  • 1,130
  • 1
  • 7
  • 10
0

You need to know what exactly the pattern is for each line. I assume that each 'column' has a fixed width, otherwise, the numbers were not be aligned like this.

For example, suppose each column is three characters wide (digits and/or spaces), and the column separator is 1 space wide, your pattern could look like this:

[ \d]{3} |[ \d]{1,3}

Now with Pattern::compile, Pattern::matcher and Matcher::find you could search for all numbers present in the current line. Assuming that lines is a List<String> with each element being a line:

// Precompile pattern. This matches either a cell followed by a space, or,
// if we are at the end of the line, a variable number of spaces and/or
// digits.
Pattern pattern = Pattern.compile("[ \\d]{3} |[ \\d]{1,3}");

List<List<Integer>> matrix = lines.stream()
    .map(pattern::matcher)
    .map(matcher -> {
        List<Integer> ints = new ArrayList<>();
        while (matcher.find()) {
            String element = matcher.group().trim();
            ints.add(!element.isEmpty() ? Integer.valueOf(element) : -1);
        }
        return ints;
    })
    .collect(Collectors.toList());

Using MatcherStream provided by dimo414:

Pattern pattern = Pattern.compile("[ \\d]{3} |[ \\d]{1,3}");
List<List<Integer>> matrix = lines.stream()
    .map(line -> MatcherStream.find(pattern, line)
        .map(String::trim)
        .map(element -> !element.isEmpty() ? Integer.valueOf(element) : -1)
        .collect(Collectors.toList()))
    .collect(Collectors.toList());
MC Emperor
  • 22,334
  • 15
  • 80
  • 130
0

I would imagine that theoretically a value could be missing anywhere within any given file line of space delimited data, even consecutive values. This would include

  • at the beginning of a data line;
  • at the end of a data line;
  • anywhere between the start and end of a data line.

Examples might be (as in your example, underscores represent whitespaces):

2___2___2_______3___1___19

1___3___2_______3___3___19

____3___4_______3___1___19

____5___7___4___3___8____

6___3___6_______5_______13

5___2___5_______________13

5___4___4___7___4_______16

10___6___10___3___8_______1

2___10___0___8___4___0___1

2___10___0___8___4________

4___12___0___9___6

The saving grace here is the fact that the data within the file appears to be formatted in a fixed space pattern. Knowing this it is possible to replace missing values with a specific integer value that will be rather obscure from the other values actually contained within each file data line. I think "-1" (what you're using) would indeed work well for this providing there is never the fear of dealing with any other signed data value within the file or -1 would never be a value of any real concern towards further data processing since its' possible existence is taken into consideration. This of course would be something you have to decide.

Once the missing values in any given data line are replaced with -1 that line can be split based on whitespace delimitation, the array elements converted to integer, and then they are placed into a integer array.

If you want to place each row (file line) of file data into an Integer Array then allow me to suggest a Two Dimensional Integer (int[][]) Array. I think you would find it much easier to deal with since the entire file of data can be contained within that specific array. Then allow a Java method to create that array, for example:

Read the entire file line by line into a String[] Array:

List<String> list = new ArrayList<>();
try (Scanner reader = new Scanner(new File("FileExample.txt"))) {
    while (reader.hasNextLine()) {
        String line = reader.nextLine();
        if (line.equals("")) { continue; }
        list.add(line);
    }
}
catch (FileNotFoundException ex) {
    Logger.getLogger("FILE NOT FOUND!").log(Level.SEVERE, null, ex);
}

// Convert list to String Array
String[] stringData = list.toArray(new String[0]);

The FileExample.txt file contains the very same data as provided above however, within the file underscores are whitespaces. Once the code above is run the String[] Array variable named stringData will contain all file data lines. We now pass this array to our next method named stringDataTo2DIntArray() (for lack of a better name) to create a 2D integer array (data[][]):

/**
 * Creates a 2D Integer (int[][]) Array from data lines contained within the 
 * supplied String Array.<br><br>
 * 
 * @param stringData (1D String[] Array) The String array where each element 
 * contains lines of fixed space delimited numerical values, for example each 
 * line would look something like:<pre>
 * 
 *     "2   1   3   4   5   6   7" </pre>
 * 
 * @param replaceMissingWith (String) One or more numerical values could be 
 * missing from any elemental line within the supplied stringData array. What 
 * you supply as an argument to this parameter will be used in place of that 
 * missing value. <br>
 * 
 * @param desiredNumberOfColumns (Integer (int)) The number of columns desired 
 * in each row of the returned 2D Integer Array. Make sure desiredNumberOfColumns 
 * contains a value greater than 0 and less then (Integer.MAX_VALUE - 4). You 
 * will most likely run out of JVM memory if you go that big! Be reasonable, 
 * although almost any unsigned integer value can be supplied (and you're 
 * encouraged to test this) the largest number of data columns contained within 
 * the data file should suffice.<br>
 * 
 * @return (2D Integer (int[][]) Array) A two dimensional Integer Array derived 
 * from the supplied String Array of fixed space delimited line data.
 */
public int[][] stringDataToIntArray(final String[] stringData, 
        final String replaceMissingWith, final int desiredNumberOfColumns) {
    int requiredArrayLength = desiredNumberOfColumns;

    // Make sure the replaceMissingWith parameter actually contains something.
    if (replaceMissingWith == null || replaceMissingWith.trim().equals("")) {
        System.err.println("stringDataToIntArray() Method Error! The "
                + "replaceMissingWith parameter requires a valid argument!");
        return null;  
    }

    /* Make sure desiredNumberOfColumns contains a value greater than 0 and
       less then (Integer.MAX_VALUE - 4).   */
    if (desiredNumberOfColumns < 1 || desiredNumberOfColumns > (Integer.MAX_VALUE - 4)) {
        System.err.println("stringDataToIntArray() Method Error! The "
                + "desiredNumberOfColumns parameter requires any value "
                + "from 1 to " + (Integer.MAX_VALUE - 4) + "!");
        return null;
    }

    // The 2D Array to return.
    int[][] data = new int[stringData.length][requiredArrayLength];

    /* Iterate through each elemental data line contained within 
       the supplied String Array. Process each line and replace 
       any missing values...   */
    for (int i = 0; i < stringData.length; i++) {
        String line = stringData[i];
        // Replace the first numerical value with replaceMissingWith if missing:
        if (line.startsWith(" ")) {
            line = replaceMissingWith + line.substring(1);
        }

        // Replace remaining missing numerical values if missing:
        line = line.replaceAll("\\s{4}", " " + replaceMissingWith);

        // Split the string of numerical values based on whitespace:
        String[] lineParts = line.split("\\s+");

        /* Ensure we have the correct Required Array Length (ie: 7):
           If we don't then at this point we were missing values at
           the end of the input string (line). Append replaceMissingWith
           to the end of line until a split satisfies the requiredArrayLength:  */
        while (lineParts.length < requiredArrayLength) {
            line+= " " + replaceMissingWith;
            lineParts = line.split("\\s+");
        }

        /* Fill the data[][] integer array. Convert each string numerical
           value to an Integer (int) value for current line:   */
        for (int  j = 0; j < requiredArrayLength; j++) {
            data[i][j] = Integer.parseInt(lineParts[j]);
        }
    } 
    return data;
}

And to use this method (once you've read the data file and placed its' contents into a String Array):

int[][] data = stringDataToIntArray(stringData, "-1", 7);

// Display the 2D data Array in Console...
for (int i = 0; i < data.length; i++) {
    System.out.println(Arrays.toString(data[i]));
}

If you've processed the example file data provided above then your console output window should contain:

[2, 2, 2, -1, 3, 1, 19]
[1, 3, 2, -1, 3, 3, 19]
[-1, 3, 4, -1, 3, 1, 19]
[-1, 5, 7, 4, 3, 8, -1]
[6, 3, 6, -1, 5, -1, 13]
[5, 2, 5, -1, -1, -1, 13]
[5, 4, 4, 7, 4, -1, 16]
[10, 6, 10, 3, 8, -1, 1]
[2, 10, 0, 8, 4, 0, 1]
[2, 10, 0, 8, 4, -1, -1]
[4, 12, 0, 9, 6, -1, -1]

If you want only the first three columns from each file line then your call would be:

int[][] data = stringDataToIntArray(stringData, "-1", 3);

and the output would look like:

[2, 2, 2]
[1, 3, 2]
[-1, 3, 4]
[-1, 5, 7]
[6, 3, 6]
[5, 2, 5]
[5, 4, 4]
[10, 6, 10]
[2, 10, 0]
[2, 10, 0]
[4, 12, 0]

and if you want 12 data columns for each file line your call would be:

int[][] data = stringDataToIntArray(stringData, "-1", 12);

and the output would look like:

[2, 2, 2, -1, 3, 1, 19, -1, -1, -1, -1, -1]
[1, 3, 2, -1, 3, 3, 19, -1, -1, -1, -1, -1]
[-1, 3, 4, -1, 3, 1, 19, -1, -1, -1, -1, -1]
[-1, 5, 7, 4, 3, 8, -1, -1, -1, -1, -1, -1]
[6, 3, 6, -1, 5, -1, 13, -1, -1, -1, -1, -1]
[5, 2, 5, -1, -1, -1, 13, -1, -1, -1, -1, -1]
[5, 4, 4, 7, 4, -1, 16, -1, -1, -1, -1, -1]
[10, 6, 10, 3, 8, -1, 1, -1, -1, -1, -1, -1]
[2, 10, 0, 8, 4, 0, 1, -1, -1, -1, -1, -1]
[2, 10, 0, 8, 4, -1, -1, -1, -1, -1, -1, -1]
[4, 12, 0, 9, 6, -1, -1, -1, -1, -1, -1, -1]

The additional -1's at then end of each array is because the method detected those columns didn't exist within the data lines but because 12 was your desired columns amount the required data was appended.

DevilsHnd - 退職した
  • 8,739
  • 2
  • 19
  • 22