How can I extract those data below and display in another file

Question

L=40
i:Classifier_name=meka.classifiers.multilabel.BR
i:Classifier_ops=[-W, weka.classifiers.rules.ZeroR]
i:Classifier_info=
i:Dataset_name=PlainAbstractsBehavioralDomainLabels
i:Type=ML
i:Threshold=0.2289156626506024
i:Verbosity=1
v:N_train=247.0
v:N_test=3.0
v:LCard_train=1.8461538461538463
v:LCard_test=0.0
v:Build_time=2.79
v:Test_time=0.005
v:Total_time=2.795
[0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036]
[0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036]
[0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036]

I got this result in text file from machine learning and how can I only display those data in another text file or any files:

1. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036
2. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036
3. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036

`display in another file` Do you want to write this text into a file? Where you got stuck? — donfuxx, Apr 08 '14 at 18:23
Maybe this can help: http://stackoverflow.com/questions/4716503/best-way-to-read-a-text-file. Otherwise, you ought to precise what treatment or what is to achieve with code sample. SO is not a homework site. — Will Marcouiller, Apr 08 '14 at 18:24
@donfuxx,yes, I want these lines into a text file. I am stuck in extracting couple lines and specific columns — Mike, Apr 08 '14 at 18:46

nikis · Accepted Answer · 2014-04-08T21:10:09.757

1

I'm not an expert in RegEx, so it would be nice, if experts will correct my pattern, but the following code works fine at least:

FileInputStream fileInputStream = new FileInputStream("data.txt");
File outputFile = new File("out.txt");
PrintWriter writer = new PrintWriter(outputFile, "UTF-8");
BufferedReader bf = new BufferedReader(new InputStreamReader(fileInputStream));
int count = 1;
String out;
Pattern pattern = Pattern.compile(":\\[((\\d\\.\\d+(,\\s)?){0,4})\\]$");

while ((out = bf.readLine()) != null){
    Matcher matcher = pattern.matcher(out);
    if (matcher.find()){
        String capture = count + ". " + matcher.group(1);
        writer.println(capture);
        System.out.println(capture);
        count++;
    }
}
fileInputStream.close();
writer.close();

So even if I'll process the following lines with different length:

[0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036, 0.55]
[0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036]
[0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036]

The output will be:

1. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036, 0.55
2. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036
3. 0.05622489959839357, 0.012048192771084338, 0.08433734939759036

edited Apr 08 '14 at 21:10

answered Apr 08 '14 at 19:02

nikis

11,166
2
35
45

Thank you, nikis. It works, I was stuck in getting the column, I have to learn the Regular Expression. – Mike Apr 08 '14 at 19:10
Nikis, this method is good, but if I want to change the [0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036] to [0, 0, 0]:[0.05622489959839357, 0.012048192771084338, 0.08433734939759036, 0.21212121212121], I have to change the code: Pattern pattern = Pattern.compile(":\\[(\\d\\.\\d+, \\d\\.\\d+, \\d\\.\\d+)\\]$"); It is not more efficient, right? – Mike Apr 08 '14 at 19:49
@Logon In such a case the pattern will be `:\\[(\\d\\.\\d+, \\d\\.\\d+, \\d\\.\\d+, \\d\\.\\d+)\\]$`, reflecting 4 numbers. – nikis Apr 08 '14 at 19:51
@Logon I've updated my answer with more elegant pattern. Now it takes care about arrays of floating point with length from 0 to 4. – nikis Apr 08 '14 at 20:11
Nikis, I checked your updated answer, there is a minor error in that string ("":\\[((\\d\\.\\d+(,\\s)?){0,4})\\]$""), you should delete the pair of "" – Mike Apr 08 '14 at 21:05

How can I extract those data below and display in another file

1 Answers1