16

I need to pick some numbers out of some text files. I can pick out the lines I need with grep, but didn't know how to extract the numbers from the lines. A colleague showed me how to do this from bash with perl:

cat results.txt | perl -pe 's/.+(\d\.\d+)\.\n/\1 /'

However, I usually code in Python, not Perl. So my question is, could I have used Python in the same way? I.e., could I have piped something from bash to Python and then gotten the result straight to stdout? ... if that makes sense. Or is Perl just more convenient in this case?

Nagel
  • 2,576
  • 3
  • 22
  • 20
  • 1
    You might reconsider and just do all of the parsing in python. It would be insanely easy to do the greping from python. If you have trouble, just post another question saying "how do I parse out these lines in python", and 5 minutes later you'll have the code – TJD Oct 20 '11 at 22:22
  • @TJD: True. I'll consider that. – Nagel Oct 20 '11 at 22:34
  • After considering some answers and methods presented, I ask: Would it work if you just used `grep -o`, which prints only the matched part of the line? – heltonbiker Oct 20 '11 at 23:03
  • @heltonbiker: I'm not sure that would work in my case, for various reason, but it's certainly worth considering for another time. Thanks :) – Nagel Oct 22 '11 at 20:11
  • 2
    As always, that's a Useless Use of Cat. See http://partmaps.org/era/unix/award.html and/or just rewrite it as `perl -pe 's/.+(\d\.\d+)\.\n/\1 /' results.txt` – tripleee Sep 04 '12 at 08:44

7 Answers7

12

Yes, you can use Python from the command line. python -c <stuff> will run <stuff> as Python code. Example:

python -c "import sys; print sys.path"

There isn't a direct equivalent to the -p option for Perl (the automatic input/output line-by-line processing), but that's mostly because Python doesn't use the same concept of $_ and whatnot that Perl does - in Python, all input and output is done manually (via raw_input()/input(), and print/print()).


For your particular example:

cat results.txt | python -c "import re, sys; print ''.join(re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line) for line in sys.stdin)"

(Obviously somewhat more unwieldy. It's probably better to just write the script to do it in actual Python.)

Amber
  • 507,862
  • 82
  • 626
  • 550
  • Oddly, the version of Python I'm using (2.7.1) doesn't seem to like inline `for` loops after semicolons -- simple commands work, but more complex structures throw a `SyntaxError`. –  Oct 20 '11 at 22:27
  • 1
    @duskwuff - that's expected. Semicolons don't have any way to specify blocks. You can use a comprehension/generator expression instead. – Amber Oct 20 '11 at 22:28
2

You can use:

$ python -c '<your code here>'
0xd
  • 1,891
  • 12
  • 18
  • Thanks for the swift reply (to both you and @Amber)! That's almost what I was looking for, but not quite. That is analogous to perl -e, but it doesn't print the output to stdout. So `python -c 2+2` gives nothing out. (You can use `python -c 'a=2+2; print a'` of course, but you get my point?) – Nagel Oct 20 '11 at 22:23
  • @Nagel : all the answers are about some command line method, but are you using a script? (for reading a bunch of files, I would sure use a script) – heltonbiker Oct 20 '11 at 23:00
2

You can in theory, but Python doesn't have anywhere near as much regex magic that Perl does, so the resulting command will be much more unwieldy, especially as you can't use regular expressions without importing re (and you'll probably need sys for sys.stdin too).

The Python equivalent of your colleague's Perl one-liner is approximately:

import sys, re
for line in sys.stdin:
    print re.sub(r'.+(\d\.\d+)\.\n', r'\1 ', line)
  • Importing standard modules should not be considered "impure" or otherwise less robust code, at least not in Python. – heltonbiker Oct 20 '11 at 22:28
  • Thanks! I suspected as much. Guess I'll learn some basic perl for this kind of task then :) – Nagel Oct 20 '11 at 22:29
  • @heltonbiker has a point of course, but the resulting python code is longer, and seems a bit unwieldy to use as a command line tool. – Nagel Oct 20 '11 at 22:31
1

Quoting from https://stackoverflow.com/a/12259852/411282:

for ln in __import__("fileinput").input(): print ln.rstrip()

See the explanation linked above, but this does much more of what perl -p does, including support for multiple file names and stdin when no filename is given.

https://docs.python.org/3/library/fileinput.html#fileinput.input

Joshua Goldberg
  • 5,059
  • 2
  • 34
  • 39
1

You have a problem which can be solved several ways.

I think you should consider using regular expression (what perl is doing in your example) directly from Python. Regular expressions are in the re module. An example would be:

import re
filecontent = open('somefile.txt').read()
print re.findall('.+(\d\.\d+)\.$', filecontent)

(I would prefer using $ instead of '\n' for line endings, because line endings are different between operational systems and file encodings)

If you want to call bash commands from inside Python, you could use:

import os
os.system(mycommand)

Where command is the bash command. I use it all the time, because some operations are better to perform in bash than in Python.

Finally, if you want to extract the numbers with grep, use the -o option, which prints only the matched part.

heltonbiker
  • 26,657
  • 28
  • 137
  • 252
1

Perl (or sed) is more convenient. However it is possible, if ugly:

python -c 'import sys, re; print "\n".join(re.sub(".+(\d\.\d+)\.\n","\1 ", l) for l in sys.stdin)'
0

You can use python to execute code directly from your bash command line, by using python -c, or you can process input piped to stdin using sys.stdin, see here.

Community
  • 1
  • 1
Camilo Díaz Repka
  • 4,805
  • 5
  • 43
  • 68