Add text to start of each line after grep

Question

I am trying to extract name of all repositories in github and build a script file to clone all of them, using this bash script:

for i in {1..10}
do
curl -u USERNAME:PASS -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' > output$i.txt
done

this is my script outputs each repo name in single line, but I need to insert git clone to start of each line so I wrote this (added | xargs -L1 git clone), which doesn't work:

for i in {1..10}
do
curl -u USERNAME:PASS -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' | xargs -L1 git clone > output$i.txt
done

Note you want `grep -oP '"clone_url": "\K[^"]+'`, not `grep -oP '"clone_url": "\K(.*)"'` — Wiktor Stribiżew, Jul 21 '19 at 10:00
seems like a duplicate of: https://stackoverflow.com/questions/19576742/how-to-clone-all-repos-at-once-from-github -- note the one-liner solution — gregory, Jul 21 '19 at 17:31
@AVEbrahimi I have completed my answer with a fully featured example https://stackoverflow.com/a/57134024/7939871 — Léa Gris, Jul 21 '19 at 18:55
... or `...| sed "s/^/git clone /"`, better than `| xargs echo`!! — F. Hauri - Give Up GitHub, Nov 14 '19 at 15:56

Léa Gris · Answer 1 · 2019-07-24T20:33:01.637

Using jq is always the best option to parse JSON data:

#!/usr/bin/env bash

for i in {1..10}
do
  curl \
    --user USERNAME:PASS \
    --silent \
    "https://api.github.com/user/repos?page=${i}" \
  | jq \
    --raw-output '.[] | "git clone \(.clone_url)"' \
    > "output${i}.txt"
done

Or to handle an arbitrary number of pages, you can tell jq to return a non-null return-code in $? by providing it with the --exit-status option.

Then if the JSON selector returns no result (witch happens when the returned GitHub API's result page is empty) the jq return-code can be tested to continue or terminate the while loop:

#!/usr/bin/env bash

typeset -i page=1 # GitHub API paging starts at page 1

while clone_cmds="$(
  curl \
    --user USERNAME:PASS \
    --silent \
    "https://api.github.com/user/repos?page=${page}" \
    | jq \
      --exit-status \
      --raw-output \
      '.[] | "git clone \(.clone_url)"'
)"; do
  # The queried page result length is > 0
  # Output to the paged file
  # and increase page number
  echo >"output$((page++)).txt" "${clone_cmds}"
done

If you want the same as above but all repositories in a single file.

The following example feature GitHub API handling of pages, rather than relying on an extra empty request to mark the end of the pages.

It now also handles pages of up to 100 entries, and negotiate a compressed transport stream if supported.

Here is the featured version of your repository cloning list:

#!/usr/bin/env bash

# Set either one to authenticate with the GitHub API.
# GitHub 'Oauth2 token':
OAUTH_TOKEN=''
# GitHub 'username:password':
USER_PASS=''

# The GitHub API Base URL:
typeset -r GITHUB_API='https://api.github.com'

# The array of Curl options to authenticate with GitHub:
typeset -a curl_auth

# Populates the authentication options from what is available.
if [[ -n ${OAUTH_TOKEN} ]]; then
  curl_auth=(--header "Authorization: token ${OAUTH_TOKEN}")
elif [[ -n ${USER_PASS} ]]; then
  curl_auth=(--user "${USER_PASS}")
else
  # These $"string" are bash --dump-po-strings ready.
  printf >&2 $"GitHub API need an authentication with either set variable:"$'\n'
  printf >&2 "OAUTH_TOKEN='%s'\\n" $"GitHub API's Oauth2 token"
  printf >&2 $"or"" USER_PASS='%s:%s'.\\n" $"username" $"password"
  printf >&2 $"See: %s"$'\n' 'https://developer.github.com/v3/#authentication'
  exit 1
fi

# Query the GitHub API for user repositories.
# The default results count per page is 30.
# It can be raised up to 100, to limit the number
# of requests needed to retrieve all the results.
# Response headers contains a Link: <url>; rel="next" as
# long as there is a next page.
# See: https://developer.github.com/v3/#pagination

# Compose the API URL for the first page.
next_page_url="${GITHUB_API}/user/repos?per_page=100&page=1"

# While there is a next page URL to query...
while [[ -n ${next_page_url} ]]; do

  # Send the API request with curl, and get back a complete
  # http_response witch --include response headers, and
  # if supported, handle a --compressed data stream,
  # keeping stderr &2 --silent.
  http_response="$(
    curl \
      --silent \
      --include \
      --compressed \
      "${curl_auth[@]}" \
      "${next_page_url}"
  )"

  # Get the next page URL from the Link: header.
  # Reaching the last page, causes the next_page_url
  # variable to be empty.
  next_page_url="$(
    sed \
      --silent \
      '/^[[:space:]]*$/,$d;s/Link:.*<\(.*\)>;[[:space:]]*rel="next".*$/\1/p' \
      <<<"${http_response}"
  )"

  # Get the http_body part from the http_response.
  http_body="$(sed '1,/^[[:space:]]*$/d' <<<"${http_response}")"

  # Query the http_body JSON content with jq.
  jq --raw-output '.[] | "git clone \(.clone_url)"' <<<"${http_body}"

done >"output.txt" # Redirect the whole while loop output to the file.

score 1 · Answer 2 · answered Jul 21 '19 at 14:32

1

grep can't substitute strings, but sed can easily replace grep and also perform substitutions:

for i in {1..10}
do
    curl -u USERNAME:PASS -s "https://api.github.com/user/repos?page=$i" |
    sed -n 's/.*"clone_url": "\([^"]*\)".*/git clone "\1"/p' > "output$i.txt"
done

Notice also When to wrap quotes around a shell variable? and the use of [^"] in the regex to specifically say that the extracted text mustn't contain a double quote.

As such, I agree with, and have upvoted, the answer which suggests to use jq instead whenever your input is JSON.

answered Jul 21 '19 at 14:32

tripleee

175,061
34
275
318

just for my learning, please can explain what is wrong in my answer, Thanks – Adiii Jul 21 '19 at 16:18
1

If you are asking for my opinion, the main objection I would raise is that you pile on processes organically at the end instead of thinking it through. Perl can do everything `grep` and `tr` can do, and then some. The `tr -d '"'` looks like a weird compromise; removing random characters in the middle of an extracted URL is just going to break it, and if you extracted more than just the URL, merely deleting a double quote is absolutely not going to fix it. – tripleee Jul 21 '19 at 17:23
Many Thanks, Got your point and will try to improve my bash skills, its mean that just the output is not important one should keep in mind the above thing, Thanks alot :) – Adiii Jul 21 '19 at 17:40

Adiii · Accepted Answer · 2019-07-21T11:17:13.620

You can append string with xargs using echo

for i in {1..10}
do
curl -u use_name:pass -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' |  tr -d '"' | xargs -n 1 echo 'git clone'
done

Also, you can do this using Perl.

for i in {1..10}
do
curl -u user_name:pass -s https://api.github.com/user/repos?page=$i | grep -oP '"clone_url": "\K(.*)"' |  tr -d '"' | perl -ne 'print "git clone $_"' > output$i.txt
done

score 0 · Answer 4 · answered Jul 21 '19 at 17:55

0

Your second script works, you just need to clean up the grep search pattern so it does not include an unmatched trailing quote:

 grep  -oP '"clone_url": \K(.*)\"' | xargs -L1 echo git clone

answered Jul 21 '19 at 17:55

gregory

10,969
2
30
42

Add text to start of each line after grep

4 Answers4