3

I have an input file


Werkzeug==2.0.2 # https://github.com/pallets/werkzeug
ipdb==0.13.9  # https://github.com/gotcha/ipdb
psycopg2==2.9.1  # https://github.com/psycopg/psycopg2
watchgod==0.7  # https://github.com/samuelcolvin/watchgod

# Testing
# ------------------------------------------------------------------------------
mypy==0.910  # https://github.com/python/mypy
django-stubs==1.8.0  # https://github.com/typeddjango/django-stubs
pytest==6.2.5  # https://github.com/pytest-dev/pytest
pytest-sugar==0.9.4  # https://github.com/Frozenball/pytest-sugar
djangorestframework-stubs==1.4.0  # https://github.com/typeddjango/djangorestframework-stubs

# Documentation
# ------------------------------------------------------------------------------
sphinx==4.2.0  # https://github.com/sphinx-doc/sphinx
sphinx-autobuild==2021.3.14 # https://github.com/GaretJax/sphinx-autobuild

# Code quality
# ------------------------------------------------------------------------------
flake8==3.9.2  # https://github.com/PyCQA/flake8
flake8-isort==4.0.0  # https://github.com/gforcada/flake8-isort
coverage==6.0.2  # https://github.com/nedbat/coveragepy
black==21.9b0  # https://github.com/psf/black
pylint-django==2.4.4  # https://github.com/PyCQA/pylint-django
pylint-celery==0.3  # https://github.com/PyCQA/pylint-celery
pre-commit==2.15.0  # https://github.com/pre-commit/pre-commit

# Django
# ------------------------------------------------------------------------------
factory-boy==3.2.0  # https://github.com/FactoryBoy/factory_boy

django-debug-toolbar==3.2.2  # https://github.com/jazzband/django-debug-toolbar
django-extensions==3.1.3  # https://github.com/django-extensions/django-extensions
django-coverage-plugin==2.0.1  # https://github.com/nedbat/django_coverage_plugin
pytest-django==4.4.0  # https://github.com/pytest-dev/pytest-django

and I am trying to extract the parts before the # for every line beginning with pytest using this command

sed -nE "s/(^pytest.+)#/\1/p" ./requirements/local.txt

Expected output

pytest==6.2.5  
pytest-sugar==0.9.4  
pytest-django==4.4.0  

Actual output

pytest==6.2.5   https://github.com/pytest-dev/pytest
pytest-sugar==0.9.4   https://github.com/Frozenball/pytest-sugar
pytest-django==4.4.0   https://github.com/pytest-dev/pytest-django

Any help to get the expected?

These refs have not helped solve this particular problem

Kwesi Smart
  • 399
  • 4
  • 14
  • 2
    You're only matching up to the #. Nothing after it is part of the matched text and thus not changed and thus printed out... easy fix is to include everything after the # in your RE too. – Shawn Jan 14 '22 at 10:17
  • Right! changing to `sed -nE "s/(^pytest.+)#.*/\1/p" ./requirements/local.txt` solved the problem. Thanks – Kwesi Smart Jan 14 '22 at 10:22
  • [How to extract text from a string using sed?](https://stackoverflow.com/questions/11568859/) and [How to use sed to extract substring](https://stackoverflow.com/questions/16675179/) and probably more are quite helpful. – Wiktor Stribiżew Jan 14 '22 at 10:30
  • Changing to `sed -nE "s/(^pytest.+)#.*/\1/p"` may have solved your problem for this particular input file, but that `sed` command will still have issues: when 1) there is no `#` character, 2) there are more than one `#` characters, in the line. – M. Nejat Aydin Jan 14 '22 at 16:12

6 Answers6

3

Using sed:

sed -nE 's/^(pytest[^=]*=[^[:blank:]]*).*/\1/p' file

pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0

However a grep -o solution would be even simpler:

grep -o '^pytest[^=]*=[^[:blank:]]*' file

pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0

Explanation:

  • ^pytest: Match pytest at the start
  • [^=]*: Match 0 or more of any character except =
  • =: Match a =
  • [^[:blank:]]*: Match 0 or more of non-whitespace characters
anubhava
  • 761,203
  • 64
  • 569
  • 643
3

You are missing the regex after #. This should solve it:

$ sed -nE "s/(^pytest.+)#.*/\1/p" ./requirements/local.txt
Mihai
  • 2,125
  • 2
  • 14
  • 16
  • 1
    Even though OP's examples doesn't show but a property file may or may not have comment section starting with `#` in each line. This command assumes `#` will always be there. So just having a line with `pytest==124` won't be matched – anubhava Jan 14 '22 at 10:42
  • 1
    N.B. This will also capture the white space before the `#`, perhaps `sed -nE 's/(^pytest\S+)\s*#.*/\1/p' file`? – potong Jan 14 '22 at 13:50
  • 1
    In addition to @anubhava's comment, your regex will capture a part of comment if there are more than one `#` characters in the line. – M. Nejat Aydin Jan 14 '22 at 15:44
  • For a more generic approach, please check the other answers. Mine simply pointed out what he missed in his very particular case. – Mihai Jan 16 '22 at 21:47
2

A sed one-liner would be:

sed -e '/^pytest/!d' -e 's/[[:blank:]]*#.*//' file

The first expression deletes lines which don't begin with pytest. The second one deletes the comment portion (including blanks before the #), if any.

M. Nejat Aydin
  • 9,597
  • 1
  • 7
  • 17
1

1st solution: With awk you could try following. Using match function of awk here, written and tested in GNU awk should work in any any. Simple explanation would be, using match function of awk to match regex ^pytest[^ ]* to match starting value of pytest till 1st occurrence of space and print the matched value by using substr function of awk.

awk 'match($0,/^pytest[^ ]*/){print substr($0,RSTART,RLENGTH)}' Input_file

2nd solution: Using GNU awk try following where making use of RS variable of it.

awk -v RS='(^|\n)pytest[^ ]*' 'RT{sub(/^\n*/,"",RT);print RT}' Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

As an alternative using awk, you might also set the field separator to # preceded by optional spaces, and print the first column if it starts with pytest

awk -F"[[:blank:]]*#" '/^pytest/ {print $1}' ./requirements/local.txt

Output

pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0

If the # is not always present, you could also make the match more specific to match the number, and then print the first field:

awk '/^pytest[^[:blank:]]*==[0-9]+(\.[0-9]+)*/ {print $1}' file
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
1

Using sed

$ sed -n '/^pytest/s/#.*//p' input_file
pytest==6.2.5
pytest-sugar==0.9.4
pytest-django==4.4.0
HatLess
  • 10,622
  • 5
  • 14
  • 32