-1

This is my statement supported by unix environment

"cat document.xml | grep \'<w:t\' | sed \'s/<[^<]*>//g\' | grep -v \'^[[:space:]]*$\'"

But I want to execute that statement in windows command prompt . How do I do that? and what are the commands which are similar to cat, grep,sed .

please tell me the exact code supported for windows similar to above command

surya
  • 21
  • 6

3 Answers3

2

The double quotes around the pipeline in your question are a syntax error, and the backslashed single quotes should apparently really not have backslashes, but I assume it's just an artefact of a slightly imprecise presentation.

Here's what the code does.

cat document.xml |

This is a useless use of cat but its purpose is to feed the contents of this file into the pipeline.

grep '<w:t' |

This looks for lines containing the literal string <w:t (probably the start of a tag in the XML format in the file). The single quotes quote the string so that it is not interpreted by the shell (otherwise the < would be interpreted as a redirection operator); they are consumed by the shell, and not passed through to grep.

sed 's/<[^<]*>//g' |

This replaces every pair of open/close brokets with an empty string. The regular expression [^<]* matches zero or more occurrences of a character which can be anything except <. If the XML is well-formed, these should always occur in pairs, and so we effectively remove all XML tags.

grep -v '^[[:space:]]*$'

This removes any line which is empty or consists entirely of whitespace.

Because sed is a superset of grep, the program could easily be rephrased as a single sed script. Perhaps the easiest solution for your immediate problem would be to obtain a copy of sed for your platform.

sed -e '/<w:t/!d' -e 's/<[^<]*>//g' -e '/[^[:space]]/!d' document.xml

I understand quoting rules on Windows may be different; try with double quotes instead of single, or put the script in a file and use sed -f file document.xml where file contains the script itself, like this:

/<w:t/!d
s/<[^<]*>//g
/[^[:space]]/!d

This is a rather crude way to extract the CDATA from an XML document, anyway; perhaps some XML processor would be the proper way forward. E.g. xmlstarlet appears to be available for Windows. It works even if the XML input doesn't have the beginning and ending <w:t> tags on the same line, with nothing else on it. (In fact, parsing XML with line-oriented tools is a massive antipattern.)

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • Thank you very much for your explanation but sed is not a windows command. So how can I do that in windows? – surya Feb 18 '16 at 12:37
  • I dumped Windows in 1998, partly because of reasons like this, so if you don't want to obtain one of the cross-platform tools I suggest, I don't think I can help you, other than recommending you consider taking the same step to improve your life and sanity. – tripleee Feb 18 '16 at 12:40
  • yes, but it shows an error like sed is not a recognised command – surya Feb 18 '16 at 12:41
0

May try with "powershell" ?

It is included since Win8 I think, for sure on W10 it is. I've just tested a "cat" command and it works.

"grep" don't but may be adapt like this : PowerShell equivalent to grep -f and https://communary.wordpress.com/2014/11/10/grep-the-powershell-way/

Community
  • 1
  • 1
DFaze
  • 21
  • 6
  • Thank you but I am using that code in a program so I cannot execute independently in a shell... – surya Feb 18 '16 at 11:52
0

The equivalent of grep on windows would be findstr and the equivalent of cat would be type.

tripleee
  • 175,061
  • 34
  • 275
  • 318
Dr.Cru
  • 23
  • 4