2

I have a text file with data line items that looks like:

  1. 1~123~JJJ
  2. 2~223~AAA
  3. 3~444~LLL
  4. 4~567~PPP
  5. 5~785~QQQ

I'd like to delete the lines that contain the following values:(I have another text file that has these values that need to be deleted) PPP QQQ

To end up with:

  1. 1~123~JJJ
  2. 2~223~AAA
  3. 3~444~LLL

I have never used R and would like to know if there is a way to have this done. If it can be done in a faster way in Python, please let me know. I am open to options.

Ggplot
  • 31
  • 1
  • 2
    While it could be done - e.g. `readLines()` the data in, identify/remove the rows, `writeLines()` it out again - i'd have to think that using old unix text tools like sed/grep would be more appropriate - see https://stackoverflow.com/questions/5410757/how-to-delete-from-a-text-file-all-lines-that-contain-a-specific-string – thelatemail Jul 08 '20 at 02:50
  • Per @thelatemail's suggestion, `sed -ibak -e "/(PPP|QQQ)/d" myfile.txt` will delete any line that contains those two strings. And with larger files, it will be faster than R and python ... but I applaud you trying to figure out how to do it in other languages. – r2evans Jul 08 '20 at 04:18

3 Answers3

3

You could use a combination of readLines and grepl, followed by writeLines:

conn <- file("path/to/input.txt")
lines <- readLines(conn)
close(conn)
lines <- lines[grepl("^(?!.*\\b(?:PPP|QQQ)\\b).*$", lines, perl=TRUE)]

conn <- file("path/to/input.txt", "w")  # assuming you want to write to the same file
writeLines(lines, conn)
close(conn)
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
2

You can use grep for integer indexing

> df[-grep("PPP|QQQ", df$V1), , drop=FALSE]
         V1
1 1~123~JJJ
2 2~223~AAA
3 3~444~LLL

Where df is a data.frame:

df <- read.table(text="1~123~JJJ
2~223~AAA
3~444~LLL
4~567~PPP
5~785~QQQ", header=FALSE, stringsAsFactors=FALSE)
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138
2

I am not familiar with R, but here's how I'd do it in python

with open("yourfile.txt", "r") as f:
    lines = f.readlines()
with open("yourfile.txt", "w") as f:
    for line in lines:
        if not line.__contains__("string to delete"):
            f.write(line)

EDIT: for this to work with reading another file with all of the strings to exclude, you'd do the following:

with open("to be deleted.txt", "r") as f:
    parts = f.readlines()
with open("yourfile.txt", "r") as f:
    lines = f.readlines()
with open("yourfile.txt", "w") as f:
    for line in lines:
        for part in parts:
            if not part in line:
                f.write(line)
GLaw1300
  • 195
  • 9
  • Is this the full script required ? I am not so great at using python. I am guessing after I run it, it becomes a print statement where I enter each item that needs to be removed ? – Ggplot Jul 08 '20 at 03:14
  • should use `in`, not `__contains__` – Derek Eden Jul 08 '20 at 03:20
  • @Ggplot see updated answer. This will read all the parts from another file and check if the line contains any of them. – GLaw1300 Jul 08 '20 at 05:00