Okay I realise there are a lot of regex questions out there but thank you for taking the time
Edited to be the solved code
https://stackoverflow.com/a/25791942/8926366 held the answer
I have a text file with quotes in them that I want to put into an ArrayList<String>
. To do this I am using Scanner
and File
methods, and I wanted to familiarise myself with regex because it seems like a really efficient way of doing it. Except that I can't seem to get it to work of course!
I managed to cobble together the following regex token courtesy of guides and peoples solutions that I understand about 85% of:
(?<=(["']\b))(?:(?=(\\?))\2.)*?(?=\1)
now I understand it this way:
(?<= # positive lookbehind group1
( # for this new group group2
["'] # the characters I am looking for
\b # word boundary anchor
) # end group2
) # end group1
(?: # non-capturing group3
(?= # lookahead group4
(\\?) # I still have no idea what this means exactly
) # end group 4
\2 # matching the contents of the 2nd group in the expression.
) # end group3
*? # lazy
(?=\1) # look ahead for group 1
I will now confirm it does not work haha
This however works (sort of, removed ' from [\"] because of my french keyboard, it would be too long to separate commas from french quotation marks, its not that big a deal in this case)
([\"])((?:(?=(\\?))\3.)*?)\1
with input:
"Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.”
"He who thinks great thoughts, often makes great errors” – Martin Heidegger
it gives:
Two things are infinite: the universe and human stupidity; and I'm not sure about the universe.
He who thinks great thoughts, often makes great errors
For all those confused over why their regex isn't working for a txt file- try using notepad++ or something to replace all the various possible quote (make sure to check the closing and opening characters!) with one kind of quote
Here is the method: (that works wonderfully now)
public class WitticismFileParser {
ArrayList<String> witticisms;
Scanner scan;
String regex="([\"])((?:(?=(\\\\?))\\3.)*?)\\1"; //"(?s)([\"])((?<quotedText>(?=(\\\\?))\\3.)*?)(?<[\"])";
public ArrayList<String> parse(String FILE_PATH){
witticisms = new ArrayList<>();
Pattern pattern = Pattern.compile(regex);
try{
File txt= new File(FILE_PATH);
scan= new Scanner(txt);
String line="";
Matcher matcher;
matcher=pattern.matcher(line);
while(scan.hasNext()){
line=scan.nextLine();
matcher=matcher.reset(line);
if (matcher.find()){
line=matcher.group(2);
witticisms.add(line);
System.out.println(line);
}
}
}catch(IOException e){
System.err.println("IO Exception- "+ e.getMessage());
e.printStackTrace();
}catch(Exception e){
System.err.println("Exception- "+e.getMessage());
e.printStackTrace();
}finally{
if(scan!=null)
scan.close();
}
return witticisms;
}
}
leaving troubleshooting here
When I just make it print line directly as the scanner gets it, I see the input text is as expected. I made sure to reformat the .txt so that all the quotation marks were the same too
Anyways thank you for any help with this, I am getting a horrible headache from reading regex documentation
Thanks to anyone who answered!!