0

I searched for other answers before asking this doubt. Iam running on a windows 11 machine. The csv file I got has a " in between some lines which is cause an error when importing to mongodb. So i wanted to remove it. So I found that the sed command is very fast in doing that. Most of you may recommend me to use the replace function in python but here it is not feasible because the file is 5GB in size. and when I tested both methods I found that sed is much faster.

In my system I have to run bash in command terminal and enter bash mode and then run the sed command there.

How should I run subprocess.run() command for this to achieve. Below is my code

import subprocess

p = subprocess.run('bash' | r"sed -i 's/\"/-/g' D:\Backupfiles\MAY2021\Names.csv", shell=True, capture_output=True, check=True)
print(p.returncode)

given below is the error I get when running the above code.

"C:\Users\AEC Office Kollam\anaconda3\envs\SDR Project\python.exe" "C:/Users/AEC Office Kollam/Documents/Atom/Python/MongoDB/SDR Project/subprocesstutorial.py"
Traceback (most recent call last):
  File "C:\Users\AEC Office Kollam\Documents\Atom\Python\MongoDB\SDR Project\subprocesstutorial.py", line 3, in <module>
    p = subprocess.run('bash' r"sed -i 's/\"/-/g' D:\Backupfiles\MAY2021\SDR1\BSNL\BSNL-DEC2020-EKYCC.csv", shell=True, capture_output=True, check=True)
  File "C:\Users\AEC Office Kollam\anaconda3\envs\SDR Project\lib\subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'bashsed -i 's/\"/-/g' D:\Backupfiles\MAY2021\SDR1\BSNL\BSNL-DEC2020-EKYCC.csv' returned non-zero exit status 1.
CyberNoob
  • 37
  • 7
  • `CalledProcessError: Command 'bashsed -i '` looks like you are missing a comma between `bash` and `sed` – Rolf of Saxony Jan 20 '22 at 08:12
  • `I have to run bash in command terminal and enter bash mode and then run the sed` But but why do you have to "run bash" and "enter bash"? Just run `sed`. – KamilCuk Jan 20 '22 at 08:17
  • @KamilCuk I am running on Windows machine. when I run sed command in terminal it gives me ``'sed' is not recognized as an internal or external command, operable program or batch file.`` So I have to run bash command first. Then run the sed command. – CyberNoob Jan 20 '22 at 08:47
  • Install [Cygwin](https://www.cygwin.com/) and use sed there; Or a native win32 utility would be [UnxUtils](http://unxutils.sourceforge.net/). – eDonkey Jan 20 '22 at 08:55
  • @eDonkey Then how should I run the subprocess.run() command because it acts in the default shell. In windows its command terminal. Here I already have a bash implemented with git bash. All I need to do is run the bash command and then sed command there. – CyberNoob Jan 20 '22 at 09:07
  • With *UnxUtils* you can run `sed` in the command line. Otherwise, if you're looking for a python implementation, have look into [PythonSed](https://github.com/GillesArcas/PythonSed). This can be used as a command line utility. – eDonkey Jan 20 '22 at 09:55
  • The size of the file should hardly matter; Python (just like `sed`) can be set up to read a line at a time, or even a byte at a time. – tripleee Jul 29 '22 at 12:17

3 Answers3

1

if you are running windows:

  1. use cygwin - https://www.geeksforgeeks.org/how-to-use-linux-commands-in-windows-with-cygwin/

  2. use commands like sed -

get-content somefile.txt | %{$_ -replace "expression","replace"}

or

get-content somefile.txt | where { $_ -match "expression"}
select-string somefile.txt -pattern "expression"

if you are running linux this will work for you:

out_file = open(outp, "w")
sub = subprocess.call(['sed', 's/\"//g', inp], stdout=out_file )
Tal Folkman
  • 2,368
  • 1
  • 7
  • 21
  • I am running on Windows machine. when I run sed command in terminal it gives me 'sed' is not recognized as an internal or external command, operable program or batch file. So I have to run bash command first. Then run the sed command. – CyberNoob Jan 20 '22 at 08:49
1

sed is just a program that bash runs from somewhere, so you should be able to run it directly with subprocess.run (or subprocess.call if you’re using an old version of Python).

In bash, use type -p sed to find out where the sed program is.

I recommend thinking about whether you really need shell=True. My guess is that you don’t. Something like the Linux code in Tal Folkman’s answer should do it. Most of the time, using the shell here just adds quoting headaches.

If you really want to go through bash, you’ll have to use the -c flag to bash. Something like

subprocess.run([r'C:\whatever\bash.exe', '-c', 'sed -i -e "s/foo/bar/" input.dat'])
Ture Pålsson
  • 6,088
  • 2
  • 12
  • 15
  • This is my new code ``import subprocess p = subprocess.run(['bash', '-c', 'sed -i 's/"/-/g' D:\Backupfiles\MAY2021\SDR1\BSNL\BSNL-DEC2020-EKYCC.txt'], shell=True, capture_output=True, check=True) print(p.stdout.decode())`` This code gives me the error as below. **p = subprocess.run(['bash', '-c', 'sed -i ``'s/"//g'`` D:\Backupfiles\MAY2021\SDR1\BSNL\BSNL-DEC2020-EKYCC.txt'], shell=True, capture_output=True, check=True) SyntaxError: invalid syntax** – CyberNoob Jan 20 '22 at 11:17
  • In windows if shell is not true the command will not be executed. – CyberNoob Jan 20 '22 at 11:59
  • The proper solution without `shell=True` is simply `subprocess.run(["sed", "-i", "-e", "s/foo/bar/", "input.dat"], check=True)`; but there is really no reason to use a subprocess for this. The Python `fileinput` library lets you easily perform the same action without a subprocess. – tripleee Jul 29 '22 at 12:10
-1

Thanks to the above two answers I was able to figure out the answer for my problem.

My csv file contained a " without and ending ". The csv file was too large to be handled by python replace command. This is my final code.

import subprocess
subprocess.run(["bash", "-c", "sed -i 's/\"//g' Name.csv"], capture_output=True, shell=True, check=True)

sed -i 's/\"//g' Name.csv here we have to use a \ for the command to work. Other than that we can use this command to replace anything on any kind of file.

Thanks You For Your Insights Everyone

CyberNoob
  • 37
  • 7