0

I would like to search the contents of a .txt file for a specific line of text and delete only that line from the .txt file.

I want to specify the line of text to find as a variable. For example:

set lineOfTextToDelete to "The quick brown fox jumps over the lazy dog."

Contents before:

Let's say the contents of my TestDelta.txt file is:

This is a a paragraph of text.
This is another line of text.
The quick brown fox jumps over the lazy dog.
Here is another line

Contents after:

The following shows the contents of the TestDelta.txt that I want after running the script. As you can see the string which has been assigned to the lineOfTextToDelete variable, i.e. "The quick brown fox jumps over the lazy dog." has been deleted from the contents of the file.

This is a a paragraph of text.
This is another line of text.
Here is another line

What I've tried so far:

Below is what I've tried, however I'm unsure what I should do next?

set txtfile to "Macintosh HD - Data:Users:crelle:Desktop:TestDelta.txt" as alias
set thisone to read txtfile
set theTextList to paragraphs of thisone

Can anyone help show me what to do?

RobC
  • 22,977
  • 20
  • 73
  • 80
Crelle
  • 21
  • 1
  • 1) Split the text into paragraphs. 2) Search for and delete the paragraph, 3) Rejoin the text – vadian Feb 25 '20 at 08:50
  • Thanks! how would you search and delete in the paragraph? – Crelle Feb 25 '20 at 09:49
  • Simple way: A `repeat` loop and `contains`. To delete create a new list and leave out the affected paragraph. – vadian Feb 25 '20 at 09:50
  • Nice thanks, do you guys know how to check if a txt doc I totally empty for text ? I could find some commands that checks lines and charters, but when I have a lot of empty lines it will count them too. how can I be sure that the txt file is empty and not just a lot of empty lines ?? :) – Crelle Feb 26 '20 at 14:55
  • 1
    @Crelle - That's a new question, so click the [Ask Question](https://stackoverflow.com/questions/ask) button. However, before asking your new question I suggest you: **1)** [Take a Tour](https://stackoverflow.com/tour) to get a better understanding of how Stack Overflow works. **2)** Read [How do I ask a good question](https://stackoverflow.com/help/how-to-ask) to increase your chances of someone providing a suitable solution/answer, instead of receiving downvotes like you did with this question. – RobC Feb 26 '20 at 16:45

1 Answers1

2

Here are, in no particular order, a couple of solutions to consider.

Before usage I recommend creating a backup copy of any .txt file that you're going to try them with. These scripts can potentially cause loss of valuable data if not used carefully.

If you have any concerns regarding assignment of the correct filepath to either;

  • The txtFilePath variable in Solution A
  • The txtFilePath property in Solution B

then replace either of those lines with the following. This will enable you to choose the file instead.

set txtFilePath to (choose file)

Solution A: Shell out from AppleScript and utilize SED (Stream EDitor)

on removeMatchingLinesFromFile(findStr, filePath)
  set findStr to do shell script "sed 's/[^^]/[&]/g; s/\\^/\\\\^/g' <<<" & quoted form of findStr
  do shell script "sed -i '' '/^" & findStr & "$/d' " & quoted form of (POSIX path of filePath)
end removeMatchingLinesFromFile

set txtFilePath to "Macintosh HD - Data:Users:crelle:Desktop:TestDelta.txt"
set lineOfTextToDelete to "The quick brown fox jumps over the lazy dog."

removeMatchingLinesFromFile(lineOfTextToDelete, txtFilePath)

Explanation:

  1. The arbitrarily named removeMatchingLinesFromFile subroutine / function contains the tasks necessary to meet your requirement. It lists two parameters; findStr and filePath. In its body we "shell out" twice to sh by utilizing AppleScript's do shell script command.

    Let's understand what's happening here in more detail:

    • The first line that reads;

      set findStr to do shell script "sed 's/[^^]/[&]/g; s/\\^/\\\\^/g' <<<" & quoted form of findStr
      

      executes a sed command. The purpose of this command is to escape any potential Basic Regular Expression (BRE) metacharacters that may exist in the given line of text that we want to delete. Utlimately it ensures each character in the given string is treated as a literal when used in the subsequent sed command - thus negating any "special meaning" the metacharacter has.

      Refer to this answer for further explanation. Essentially it does the following:

      • Every character except ^ is placed in its own character set [...] expression to treat it as a literal.
        • Note that ^ is the one char. you cannot represent as [^], because it has special meaning in that location (negation).
      • Then, ^ chars. are escaped as \^.
        • Note that you cannot just escape every char by putting a \ in front of it because that can turn a literal char into a metachar, e.g. \< and \b are word boundaries in some tools, \n is a newline, \{ is the start of a RE interval like \{1,3\}, etc.

      Credit for this SED pattern goes to Ed Morton and mklement0.

      So, given that the string assigned to the variable named lineOfTextToDelete is:

      The quick brown fox jumps over the lazy dog.
      

      we actually end up assigning the following string to the findStr variable after it has been parsed via the sed command:

      [T][h][e][ ][q][u][i][c][k][ ][b][r][o][w][n][ ][f][o][x][ ][j][u][m][p][s][ ][o][v][e][r][ ][t][h][e][ ][l][a][z][y][ ][d][o][g][.]
      

      As you can see each character is wrapped in opening and closing square brackets, i.e. [], to form a series of bracket expressions.

      To further demonstrate what's happening; launch your Terminal application and run the following compound command:

      sed 's/[^^]/[&]/g; s/\^/\\^/g' <<<"The quick brown fox jumps over the lazy dog."
      

      Note When running the aforementioned compound command directly via the Terminal the sed pattern contains less backslashes (\) in comparison to the pattern specified in the AppleScript. This is because AppleScript strings require any backslash to be escaped with an additional backslash.

    • The second line reading;

      do shell script "sed -i '' '/^" & findStr & "$/d' " & quoted form of (POSIX path of filePath)
      

      executes another sed command via the shell. This performs the task of finding all instances of the given line of text in the file and deletes it/them.

      • The -i option specifies that the file is to be edited in-place, and requires a following empty string argument ('') when using the BSD version of sed that ships with macOS.

      • The '/^" & findStr & "$/d' part is the pattern that we provide to sed.

        • The ^ metacharacter matches the null string at beginning of the pattern space - it essentially means start matching the subsequent regexp pattern only if it exists at the beginning of the line.

        • The Applescript findStr variable is the result we obtained via the previous sed command. It is concatenated with the preceding pattern part using the & operator.

        • The $ metacharacter refers to the end of pattern space, i.e. the end of the line.

        • The d is the delete command.

        • The & quoted form of (POSIX path of filePath) part utilizes AppleScript's POSIX path property to transform your specified HFS path, i.e.

          Macintosh HD - Data:Users:crelle:Desktop:TestDelta.txt
          

          to the following POSIX-style path:

          /Macintosh HD - Data/Users/crelle/Desktop/TestDelta.txt
          

          The quoted form property ensures correct quoting of the POSIX-style path. For example, it ensures any space character(s) in the given pathname are interpreted correctly by the shell.

      Again, to further demonstrate what's happening; launch your Terminal application and run the following compound command:

      sed -i '' '/^[T][h][e][ ][q][u][i][c][k][ ][b][r][o][w][n][ ][f][o][x][ ][j][u][m][p][s][ ][o][v][e][r][ ][t][h][e][ ][l][a][z][y][ ][d][o][g][.]$/d' ~/Desktop/TestDelta.txt
      
  2. Let's understand how to use the aforementioned removeMatchingLinesFromFile function:

    • Firstly we assign the same HFS path that you specified in your question to the arbitrarily named txtFilePath variable:

      set txtFilePath to "Macintosh HD - Data:Users:crelle:Desktop:TestDelta.txt"
      
    • Next we assign the line of text that we want to find and delete to the arbitrarily named lineOfTextToDelete variable:

      set lineOfTextToDelete to "The quick brown fox jumps over the lazy dog."
      
    • Finally we invoke the custom removeMatchingLinesFromFile function, passing in two required arguments namely; lineOfTextToDelete and txtFilePath:

      removeMatchingLinesFromFile(lineOfTextToDelete, txtFilePath)
      

Solution B: Using vanilla AppleScript without SED:

This solution provided below does not utilize the shell, nor SED, and produces the same desired result as per Solution A.

property lineOfTextToDelete : "The quick brown fox jumps over the lazy dog."
property txtFilePath : alias "Macintosh HD - Data:Users:crelle:Desktop:TestDelta.txt"

removeMatchingLinesFromFile(lineOfTextToDelete, txtFilePath)


on removeMatchingLinesFromFile(findStr, filePath)
  set paraList to {}
  repeat with aLine in getLinesFromFile(filePath)
    if contents of aLine is not findStr then set paraList to paraList & aLine
  end repeat
  set newContent to transformListToText(paraList, "\n")
  replaceFileContents(newContent, filePath)
end removeMatchingLinesFromFile


on getLinesFromFile(filePath)
  if (get eof of filePath) is 0 then return {}
  try
    set paraList to paragraphs of (read filePath)
  on error errorMssg number errorNumber
    error errorMssg & errorNumber & ": " & POSIX path of filePath
  end try
  return paraList
end getLinesFromFile


on transformListToText(ListOfStrings, delimiter)
  set {tids, text item delimiters} to {text item delimiters, delimiter}
  set content to ListOfStrings as string
  set text item delimiters to tids
  return content
end transformListToText


on replaceFileContents(content, filePath)
  try
    set readableFile to open for access filePath with write permission
    set eof of readableFile to 0
    write content to readableFile starting at eof
    close access readableFile
    return true
  on error errorMssg number errorNumber
    try
      close access filePath
    end try
    error errorMssg & errorNumber & ": " & POSIX path of filePath
  end try
end replaceFileContents

Explanation:

I'll keep this explanation brief as the code itself is probably easier to comprehend than Solution A.

The removeMatchingLinesFromFile subroutine essentially performs the following with the aid of additional helper functions:

  1. read's the contents of the given .txt file via the getLinesFromFile function and return's a list. Each item in the returned list holds each line/paragraph of text found in the .txt file content.

  2. We then loop through each item (i.e. each line of text) via a repeat statement. If the contents of each item does not equal the given line of text to find we store it in another list, i.e. the list assigned to the paraList variable.

  3. Next, the list assigned to the paraList variable is passed to the transformListToText function along with a newline (\n) delimiter. The transformListToText function returns a new string.

  4. Finally, via the replaceFileContents function, we open for access the original .txt file and overwrite its contents with the newly constructed content.


Important note applicable to either solution: When specifying the line of text that you want to delete, (i.e. the string that is assigned to the lineOfTextToDelete variable), ensure each and every backslash \ that you may want to search for is escaped with another one. For example; if the line that you want to search for contains a single backslash \ then escape it to become two \\. Similarly if the line that you want to search for contains two consecutive backslashes \\ then escape each one to become four \\\\, and so on.


Community
  • 1
  • 1
RobC
  • 22,977
  • 20
  • 73
  • 80