2

I have a list that looks like this:

>aaa(+)
AAAAAAAAAA
>bbb(+)
BBBBBBBBBBBBBBBB
>ccc(-)
CCCCCCC

And I want to use awk to join the next line after either '(+)' or '(-)', with a comma delimiter, so that it looks like this:

>aaa(+),AAAAAAAAAAA
>bbb(+),BBBBBBBBBBBBBBBB
>ccc(-),CCCCCCC

I have already tried the following (in bash):

cat $file | awk '/(-)/||/(+)/{if (x)print x;x"";}{x=(!x)?$0:x","$0;}END{print x;}' > $new_file

but this appears to give a result like this:

>aaa(+),AAAAAAAAAAA
>aaa(+),AAAAAAAAAAA,>bbb(+),BBBBBBBBBBBBBBBB
>aaa(+),AAAAAAAAAAA,>bbb(+),BBBBBBBBBBBBBBBB,>ccc(-),CCCCCCC

which is obviously not what I am trying to do.

Any help would be very appreciated!

Thanks

arielle
  • 915
  • 1
  • 12
  • 29
  • http://stackoverflow.com/questions/15857088/remove-line-breaks-in-a-fasta-file is very close. You might want to include "fasta" in the question if that's actually what you are operating on. – tripleee Aug 16 '16 at 08:22

5 Answers5

5

This awk one-liner should work for your example:

awk '/^>/{printf "%s,",$0;next}7' file

It joins the line beginning with > with the line below it. If the (+/-) is the key, you can change the pattern to your interested key.

Kent
  • 189,393
  • 32
  • 233
  • 301
  • 1
    awk '/^>/{printf "%s,",$0;next}7' file (comma missing after %s), to have exaclty the asked result. I can't edit the answer as it's matter of just 1 character. – Alberto Aug 15 '16 at 15:18
  • It appears to be working, thanks! Can you explain why you add '^' before the '>'? (I am a beginner in awk, I have used commands like this before: awk '/pattern/{ print $0 }' file, but I never had to add '^' before the pattern. – arielle Aug 15 '16 at 15:22
  • `^>` is regex, it matches a line starting with `>` @arielle – Kent Aug 15 '16 at 15:25
  • I am also curious about the '7' at the end of the command, I tried looking that up but I am not sure what this is for. – arielle Aug 15 '16 at 15:25
  • @arielle non-zero number in awk does default action: `print` – Kent Aug 15 '16 at 15:26
  • @Alberto thx for pointing it out, I didn't notice there is a comma.... added it in answer. – Kent Aug 15 '16 at 15:26
  • Alright that answers my questions, thanks to all of you for your help! – arielle Aug 15 '16 at 15:27
5

another minimalist awk

$ awk 'ORS=/^>/?",":RS' file 
karakfa
  • 66,216
  • 7
  • 41
  • 56
  • 1
    Wonder how you'd come up with such ideas.. ++ – Мона_Сах Aug 15 '16 at 21:03
  • I have thought about your answer and I understand how it works except for the last part (:RS). I know RS is the record separator, but I have only seen it in the form RS= something. What is the purpose of using it at the end of the code and why is there a ':' before it? Thank you! – arielle Aug 17 '16 at 17:33
  • the ternary operator `?` sets the output record separator to either comma or the default value (which is RS). Similarly you can set it to newline `"\n"`, but this is shorter. – karakfa Aug 17 '16 at 17:36
1

With gnu awk you may also do it like this :

$ awk -v RS=">"  '$0 != ""{ printf ">%s",gensub(/\)\n/,"),","g")}' file
>aaa(+),AAAAAAAAAA
>bbb(+),BBBBBBBBBBBBBBBB
>ccc(-),CCCCCCC
sjsam
  • 21,411
  • 5
  • 55
  • 102
0
awk '{printf "%s%s", $0, (NR%2 ? "," : ORS)}' file
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 2
    This is a low-quality answer. You can do better! – jpaugh Aug 15 '16 at 23:04
  • @jpaugh in what way is it low quality? It's a trivial solution to a trivial problem using basic awk constructs and no redundancy of operators. It's exactly what I'd use in my own code if faced with this problem. What more do you want? – Ed Morton Aug 15 '16 at 23:28
  • 1
    Well, it doesn't explain what it does, which is always important. FYI, I found this in the low-quality review queue, so I wasn't the only one to think so. – jpaugh Aug 16 '16 at 12:00
  • No, an explanation is NOT always important. It's a tool that automatically adds brief answers to the low quality review queue for a person who knows the domain to decide if that answer is low quality or not. If you don't know the domain, leave it for someone else who does to make the determination. Look all the other answer on this page that also have no explanations because the solutions are just THAT trivial. Adding an explanation for this would be like adding `incrementing i` comment to the code `i++`. – Ed Morton Aug 16 '16 at 13:31
  • Think of your audience. Someone who knows`awk` well enough understand such a 'trivial' answer would have little need to ask; I realize you're probably busy, and answering trivial questions is not the highlight of your day. Nevertheless, every small effort put into an expert answer can reduce ten-fold the effort of a novice trying to understand. – jpaugh Aug 16 '16 at 19:28
  • You don't need to know awk at all to understand what `printf` does and if you don't understand this answer then there's a hell of a lot more you also don't understand so you need to pick up a book and getting an explanation of this answer won't help with your much bigger problem. This discussion is an absurd waste of our time - if you want to review answers, by all means go ahead but PLEASE stick to reviewing answers you have a clue about and so can judge if they're low priority or not - if every brief answer was low quality then the tool would delete them right away instead of flagging them. – Ed Morton Aug 16 '16 at 22:36
  • Knows code-only answers are generally considered low-quality. Posts one anyway and posts a Meta rant when the expected happens. – Alexander O'Mara Aug 16 '16 at 23:12
  • @AlexanderO'Mara Read your own comment - the important word is **generally**, which is not the same as **always**. Read every other answer in this page - they're all perfectly fine and none of them have an explanation because **in this case** no explanation is needed/useful. The question I posted on meta was far from a rant, I was hoping to get some understanding of why people who have no idea about a given area are allowed to review answers in that area. It's every bit as useful as food critics reviewing movies. – Ed Morton Aug 16 '16 at 23:19
  • Generally, here-meaning people will generally consider it low-quality, sentiment I would tend to agree with. Also, your argument about the other answers equates to "everyone else is doing it, so it must be ok". – Alexander O'Mara Aug 16 '16 at 23:23
  • If that was the purpose of your Meta post, it wasn't very clear. Perhaps you would have more luck focusing on that? – Alexander O'Mara Aug 16 '16 at 23:24
  • Again, generally != always. I agree with it too - **generally** that is the case and it makes sense for a tool to flag such for review **by a _person_ who understands the domain** and can make an educated judgement on the individual case. No, that is not what my statement about the other answer equates to. If no-one comments `increment i` next to `i++` and neither do I, that's because it's **the right thing to do**. wrt my meta post, no I just deleted it because it was clear very quickly that it wasn't going to get me anywhere and I've wasted enough time on this topic. – Ed Morton Aug 16 '16 at 23:25
  • I'm afraid you've thus-far shown no evidence people with no knowledge of the domain are the ones flagging as such. And you're using the other answers as evidence code-only answers are fine, which is poor evidence. – Alexander O'Mara Aug 16 '16 at 23:33
  • AFAIK I'm not on trial, so I don't need to produce evidence. Having said that, go click on the name of the person who made the original comment if you like and I guarantee you won't find an `awk` answer posted by them. Again, I am not using the other answers as evidence that code-only answers are fine. What Comp Sci professor ever said that you should comment trivial code to explain what it does? This conversation has wasted even more of my time. – Ed Morton Aug 16 '16 at 23:46
  • No you don't have to produce evidence, but Meta also doesn't have to take you seriously, and without any evidence, that should be expected. But whatever, it seems you already know the solution to your problem. – Alexander O'Mara Aug 16 '16 at 23:50
  • We aren't on Meta. Absolutely I know the solution to my problem - ignore the few people posting comments about things they know nothing about. – Ed Morton Aug 17 '16 at 00:11
0
paste -d, - - < file

Paste will do the job if your file consists entirely of pairs of lines, as in your example.

user2138595
  • 187
  • 7