0

Below is my code:

string ckeywords = File.ReadAllText("E:\\ckeywords.csv");
string[] clines = File.ReadAllLines("E:\\cprogram\\cpro\\bubblesort.c");
string letters="";

foreach(string line in clines)
{
    char[] c = line.ToCharArray();
    foreach(char i in c)
    {
        if (i == '/' || i == '"')
        {
            break;
        }
        else 
        {
            letters = letters + i;
        }
    }
}
letters = Regex.Replace(letters, @"[^a-zA-Z ]+", " ");

List<string> listofc = letters.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();
List<string> listofcsv = ckeywords.Split(new char[] { ',', '\t', '\n', ' ' }, StringSplitOptions.RemoveEmptyEntries).Select(p => p.Trim()).ToList();
List<string> Commonlist = listofcsv.Intersect(listofc).ToList();

With this if condition, I am able to ignore reading contents of single line comment and contents between ("").

I need to ignore reading contents of multi line comments. Which condition should I use? Suppose my .c file is having this line of comment so with above code I don't know how to start iterating from /* to */ and to ignore the contents in between.

/*printf("Sorted list in ascending order:\n");

for ( c = 0 ; c < n ; c++ ) printf("%d\n", array[c]);*/

Surabhi Pandey
  • 4,058
  • 4
  • 18
  • 25
  • http://stackoverflow.com/questions/3524317/regex-to-strip-line-comments-from-c-sharp/3524689#3524689 – Amit Kumar Ghosh Dec 29 '16 at 10:10
  • As alternative. If you read line for line, then you could just look for the start /* and delete everything from that line. You pull a flag and delete every line until you read the end */. And prevent to delete anything past that. – Totumus Maximus Dec 29 '16 at 10:13
  • I know the logic. I need a code in csharp to ignore in a simpler way. – Surabhi Pandey Dec 29 '16 at 10:30
  • What you ask for is complicated. Not too complicated, but not simple either, and you need to be *very* precise in what you do and don't want. For instance, you put code to handle `"` characters in your question. Does that mean that it's important to you *not* to strip out `" /* in a string */ "`? How about `'"' /* really a comment */ - '"'`? How about `"\"" /* also really a comment */ "\""`, or `"\" /* in a string again */ \""`? These are decisions we can't make for you. That's your responsibility, and it's also your responsibility to accurately state the required behaviour in your question. –  Dec 29 '16 at 10:54
  • @AmitKumarGhosh- With your suggested link also I am not getting the correct output. – Surabhi Pandey Dec 29 '16 at 11:02
  • @hvd- I want to pick only c keywords from a .c file. With above code I am getting the result but now my problem is suppose printf or any other c keyword is written inside // or /**/ or ("") than i want to ignore reading this keyword. Also if my for loop is like this for(int i=0;i – Surabhi Pandey Dec 29 '16 at 11:14
  • @SurabhiPandey If the complete C syntax needs to be supported, and anything that happens to contain `int` but not as a keyword needs to be excluded, then that's even worse. `typedef int hello;`, `#define int hello`, `int main() { }` is a valid C program, and the `int` on lines two and three is *not* a keyword. If you have to handle this, you need to do a *lot* more work yourself first. And if you don't have to worry about programs such as these, if you only need to support the subset of C that's used in your specific file, *edit your question to state what you need*. –  Dec 29 '16 at 11:19
  • @SurabhiPandey I'm not *making* this so complicated, what you're asking for *is* so complicated. What you're asking for is probably not what you need. This isn't about me. If you refuse to clarify your question to state what you *do* need, then I won't be the only one who won't be able to answer the question you asked, even though if you asked the question you wanted to ask, you'd have had a good answer by now already. –  Dec 29 '16 at 11:33
  • To be clear here. It is **not** difficult writing code to strip out `/*...*/` when reading the file. There are many here that can post an answer with such a piece of code, me included. The problem is that since you're not *just* saying I need to strip out this kind of text, but you're **also** saying "I need to read C code and get keywords" then the problem **is** more complicated. Any answer that strips out comments in a naive way will work up to a point and then fail horribly. Please clarify that you don't care about this more complicated problem and someone will post an answer. – Lasse V. Karlsen Dec 29 '16 at 11:36

2 Answers2

2

I successfully solved my problem now I can ignore reading contents of /* */ in a simpler way without using Regular Expression. Here is my code:

string[] clines = File.ReadAllLines("E:\\cprogram\\cpro\\bubblesort.c");
List<string> list = new List<string>();
int startIndexofcomm, endIndexofcomm;

 for (int i = 0; i < clines.Length ; i++ )
    {
       if (clines[i].Contains(@"/*"))
          {
             startIndexofcomm = clines[i].IndexOf(@"/*");
             list.Add(clines[i].Substring(0, startIndexofcomm));

             while(!(clines[i].Contains(@"*/")))
             {
                i++;
             }

             endIndexofcomm = clines[i].IndexOf(@"*/");
             list.Add(clines[i].Substring(endIndexofcomm+2));

             continue;
          }
          list.Add(clines[i]);
     }
Surabhi Pandey
  • 4,058
  • 4
  • 18
  • 25
1

Here is code that naively does the following:

  1. It strips out any multi-line comments starting with /* and ending with */, even if there are newlines between the two.
  2. It strips out any single-line comments starting with // and ending at the end of the line
  3. It does not strip out any comments like the above if they're within a string that starts with " and ends with a ".

LINQPad code:

void Main()
{
    var code = File.ReadAllText(@"d:\temp\test.c");
    code.Dump("input");

    bool inString = false;
    bool inSingleLineComment = false;
    bool inMultiLineComment = false;

    var output = new StringBuilder();
    int index = 0;

    while (index < code.Length)
    {
        // First deal with single line comments: // xyz
        if (inSingleLineComment)
        {
            if (code[index] == '\n' || code[index] == '\r')
            {
                inSingleLineComment = false;
                output.Append(code[index]);
                index++;
            }
            else
                index++;

            continue;
        }

        // Then multi-line comments: /* ... */
        if (inMultiLineComment)
        {
            if (code[index] == '*' && index + 1 < code.Length && code[index + 1] == '/')
            {
                inMultiLineComment = false;
                index += 2;
            }
            else
                index++;
            continue;
        }

        // Then deal with strings
        if (inString)
        {
            output.Append(code[index]);
            if (code[index] == '"')
                inString = false;
            index++;
            continue;
        }

        // If we get here we're not in a string or in a comment
        if (code[index] == '"')
        {
            // We found the start of a string
            output.Append(code[index]);
            inString = true;
            index++;
        }
        else if (code[index] == '/' && index + 1 < code.Length && code[index + 1] == '/')
        {
            // We found the start of a single line comment
            inSingleLineComment = true;
            index++;
        }
        else if (code[index] == '/' && index + 1 < code.Length && code[index + 1] == '*')
        {
            // We found the start of a multi line comment
            inMultiLineComment = true;
            index++;
        }
        else
        {
            // Just another character
            output.Append(code[index]);
            index++;
        }
    }

    output.ToString().Dump("output");
}

Sample input:

This should be included // This should not
This should also be included /* while this
should not */ but this should again be included.

Any comments in " /* strings */ " should be included as well.
This goes for "// single line comments" as well.

Sample output (note that there are some spaces at the end of some of the lines below that aren't visible):

This should be included 
This should also be included  but this should again be included.

Any comments in " /* strings */ " should be included as well.
This goes for "// single line comments" as well.
Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • For completeness, compared to how C: this doesn't handle character constants (in `int main() { '"'; short s; }`, `short s;` is not part of a string), it doesn't handle backslashes in strings (in `int main() { "\""; short s; }`, `short s;` is again not part of a string) or as part of line splicing (in ```/\```, and then on the next line `* int main() {} */`, the two lines form a comment), and when modified to handle backslashes and character constants, trigraphs could form a problem too (in `int main() { 0??'""[0]; short s; }`, `short s;` is not part of a character constant). This may be okay. –  Dec 29 '16 at 12:22
  • Yes, but as I tried to state in my comments to the question, if the OP ***explicitly!*** doesn't need/want such "complicated" things, a naive solution is the best that can be produced. I wouldn't even want to *try* writing a solution for this that can handle all compliant C syntax. – Lasse V. Karlsen Dec 29 '16 at 13:06
  • I wouldn't either; given that that's what the OP asked, I opted to just not answer at all. :) I just wanted to make it clear what would and wouldn't work so that the OP and others reading this answer can make an informed decision as to whether it's good enough for their needs. –  Dec 29 '16 at 13:13