0

I've read a little about sed and awk, and understand that both are text manipulators.

I plan to use one of these to edit groups of files (code in some programming language, js, python etc.) to make similar changes to large sets of files. Primarily editing function definitions (parameters passed) and variable names for now, but the more I can do the better.

I'd like to know if someone's attempted something similar, and those who have, are there any obvious pitfalls that one should look out for? And which of sed and awk would be preferable/more suitable for such an application. (Or maybe something entirely else? )

Input

function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}

Output

function(ParamterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something
}
ffledgling
  • 11,502
  • 8
  • 47
  • 69
  • 3
    Which is more appropriate greatly depends on the text manipulation you are doing. `awk` will be better for some tasks, `sed` for others. You will probably end up using both. – William Pursell Feb 23 '13 at 23:45
  • Why don't you just show us what you're trying to do? Include some input and expected output. Your question greatly depends on what you're actually trying to do. – Steve Feb 24 '13 at 09:31
  • Yes, definitely possible, but not a trival task, and corner cases can make it extra-frustrating. Time spent reading and working thru http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Dstripbooks&field-keywords=orielly+sed+and+awk will be well rewarded. Good luck. – shellter Feb 24 '13 at 23:32
  • awk can do that more concisely than any other scripting language BUT writing a parser for the language your files are written in is going to be extremely difficult no matter what scripting language you choose for the transformation. You'd probably be faster and less error prone changing the files by hand than trying to write a tool to do it. – Ed Morton Feb 25 '13 at 13:06
  • I see some confusion in various comments about applicability of tools. sed is an excellent tool for simple changes to a single line of text. It has many other language constructs that should never be used. awk is a concise, full featured tool/scripting language for all other manipulations of text. bash is an environment/scripting language to call tools from and manipulate files and processes. So if you need to non-trivially manipulate text in multiple files, write the text processing concisely in awk and the file finding/updating/invoking awk concisely in bash. – Ed Morton Feb 25 '13 at 13:42

2 Answers2

2

Here's a GNU awk (for "gensub()" function) script that will transform your sample input file into your desired output file:

$ cat tst.awk
BEGIN{ sym = "[[:alnum:]_]+" }
{
   $0 = gensub("^(" sym ")[(](" sym ")[)](.*)","\\1(ParameterOne)\\3","")
   $0 = gensub("^(var )(" sym ")(.*)","\\1PartOfSomething.\\2\\3","")
   $0 = gensub("^a(rray.*)","sA\\1","")
   $0 = gensub("^(" sym " =.*)","var \\1","")

   print
}

$ cat file
function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}

$ gawk -f tst.awk file
function(ParameterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something;
}

BUT think about how your real input could vary from that - you could have more/less/different spacing between symbols. You could have assignments starting on one line and finishing on the next. You could have comments that contain similar-looking lines to the code that you don't want changed. You could have multiple statements on one line. etc., etc.

You can address every issue one at a time but it could take you a lot longer than just updating your files and chances are you still will not be able to get it completely right.

If your code is EXCEEDINGLY well structured and RIGOROUSLY follows a specific, highly restrictive coding format then you might be able to do what you want with a scripting language but your best bets are either:

  1. change the files by hand if there's less than, say, 10,000 of them or
  2. get a hold of a parser (e.g. the compiler) for the language your files are written in and modify that to spit out your updated code.
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

As soon as it starts to get slightly more complicated you will switch to a script language anyway. So why not start with python in the first place?

Walking directories: walking along and processing files in directory in python

Replacing text in a file: replacing text in a file with Python

Python regex howto: http://docs.python.org/dev/howto/regex.html

I also recommend to install Eclipse + PyDev as this will make debugging a lot easier.

Here is an example of a simple automatic replacer

import os;
import sys;
import re; 
import itertools;

folder = r"C:\Workspaces\Test\";
skip_extensions = ['.gif', '.png', '.jpg', '.mp4', ''];
substitutions = [("Test.Alpha.", "test.alpha."), 
                 ("Test.Beta.", "test.beta."),
                 ("Test.Gamma.", "test.gamma.")];

for root, dirs, files in os.walk(folder):
    for name in files:
        (base, ext) = os.path.splitext(name);
        file_path = os.path.join(root, name);
        if ext in skip_extensions: 
            print "skipping", file_path;
        else:
            print "processing", file_path;

            with open(file_path) as f:
                s = f.read();

            before = [[s[found.start()-5:found.end()+5] for found in re.finditer(old, s)] for old, new in substitutions];
            for old, new in substitutions:
                s = s.replace(old, new);
            after = [[s[found.start()-5:found.end()+5] for found in re.finditer(new, s)] for old, new in substitutions];

            for b, a in zip(itertools.chain(*before), itertools.chain(*after)):
                print b, "-->", a;

            with open(file_path, "w") as f:
                f.write(s);
Community
  • 1
  • 1
Udo Klein
  • 6,784
  • 1
  • 36
  • 61
  • using a full fledged programming/scripting language like perl or python shouldn't be necessary for simple text processing. Won't a simple mix of sed, awk and bash suffice? – ffledgling Feb 23 '13 at 22:32
  • A "simple mix of sed, awk and bash" is quite often more complicated and harder to maintain than a scripted solution. The point is that there are debuggers for script languages while shell scripts are notoriously harder to debug. – Udo Klein Feb 24 '13 at 09:51
  • If you've attempted something similar using python or something other scripting language, I request you to please add that to your answer. – ffledgling Feb 24 '13 at 10:36
  • Udo, I'm already familiar with regex, directory traversal and substitution in python, what I was hoping for, was to learn from someone who'd actually attempted editing files of codes in batches like this. – ffledgling Feb 24 '13 at 19:29
  • I do this quite often. What I do not get: if you are familiar with this techniques, why would you need a code example? – Udo Klein Feb 24 '13 at 20:32
  • Udo, I've attempted parsing and working with HTML/XML pages using python and it's regex, but Such code is highly structured (well formed). Normal code is not as rigorously structured. I was looking forward to learning from experienced people for common pitfalls/errors one might notice only after the mistake/blunder has been committed. i.e some obvious gotchaya's a newbie would miss. – ffledgling Feb 25 '13 at 13:41
  • This was not your original question. With regard to HTML/XML: don't use regular expressions. My preferred approach for XML/HTML is lxml: http://lxml.de/tutorial.html It definitely beats the standard libraries. – Udo Klein Feb 25 '13 at 13:43
  • Udo, Maybe you're right, my question isn't as clear as it should be. What I meant was I have worked with python and regex for HTML/XML, and after a lot of trouble, I shifted to beautiful soup, because I didn't know about the problems regex would cause with HTML/XML beforehand. Thus this question, to avoid something similar. **Question Updated accordingly.** – ffledgling Feb 25 '13 at 13:51