0

I always start writing an awk script thinking it's simple, but find myself baffled by the strange result.. This time again..

I have a file list.txt with lines below (reduced for test. The line following 'Preview Abstract' is the abstract, and the following 'View details' is the paper title.)

Google Scholar
Copy BibTex
Preview Abstract
This is the first paper's abstract.
View details
From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following
Justin Fu, Anoop Korattikara, Sergey Levine, Sergio Guadarrama  International Conference on Learning Representations (ICLR) (2019)

Google Scholar
Copy BibTex
Preview Abstract
This is the second paper's abstract.
View details
A Study on Overfitting in Deep Reinforcement Learning
Chiyuan Zhang, Oriol Vinyals, Remi Munos, Samy Bengio  arXiv (2018)

Google Scholar
Copy BibTex
Preview Abstract
This is the third papers's abstract.
View details
Ask the Right Questions: Active Question Reformulation with Reinforcement Learning
Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Pawe©© Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang  Sixth International Conference on Learning Representations (2018)

and I want to get a file like this (in "title ## abstract" format).

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following ## This is the first paper's abstract.
A Study on Overfitting in Deep Reinforcement Learning ## This is the second paper's abstract.
Ask the Right Questions: Active Question Reformulation with Reinforcement Learning ## This is the third papers's abstract.

So I wrote this awk script (paper.awk). (uses keyword to keep title and abstrct lines and print them in the wanted format after getting the paper title)

print_res == 1 {print title " ## " abs; print_res = 0}
next_abs == 1 {abs = $0; next_abs = 0}
next_title == 1 {title = $0; next_title = 0; print_res = 1}
/Preview Abstract/{next_abs = 1}
/View details/{next_title = 1}

When I run awk -f paper.awk list.txt, I get

 ## This is the first paper's abstract.cement Learning for Vision-Based Instruction Following
 ## This is the second paper's abstract.ment Learning
 ## This is the third papers's abstract. Reformulation with Reinforcement Learning

Somehow the paper titles are overwritten with the ' ## abstract" string. What's wrong?

Chan Kim
  • 5,177
  • 12
  • 57
  • 112
  • 1
    I suspect your list.txt file has DOS/Windows line endings, and the carriage returns from that are messing things up. See [here](https://stackoverflow.com/questions/31885409/why-would-a-correct-shell-script-give-a-wrapped-truncated-corrupted-error-messag) for more info. – Gordon Davisson Apr 17 '19 at 03:12
  • 1
    I was able to clearly reproduce the problem if the input you ran had DOS line endings. Run `dos2unix` on your input file before running `awk` and it should work fine – Inian Apr 17 '19 at 03:22
  • Oh Thank you very much! I will check when I'm at my work. Yes, The input file was from windows. – Chan Kim Apr 17 '19 at 04:40
  • @GordonDavisson That was, correct. After dos2unix it works ok. (if you can write it as an answer, I will choose it as the selected answer) – Chan Kim Apr 17 '19 at 05:13
  • 1
    @ChanKim The question is pretty much a duplicate, so I can't answer it, but I will make another suggestion: add `/\r$/ {sub("\r$", "")}` at the beginning of your script, and it'll correctly handle DOS/Windows files without having to convert them first. – Gordon Davisson Apr 17 '19 at 05:53

0 Answers0