0

I am beginning a personal project of converting an interpreter written in python into C. It is purely for learning purposes.

The first thing I have come across is trying to convert the following:

if __name__ == "__main__":
    if not argv[-1].endswith('.py'):
        ...

And I have done the following conversion thus far for the endswith method

#include <stdio.h>
#include <string.h>
#include <stdbool.h>

bool endswith(char* str, char* substr)
{
    // case1: one of the strings is empty
    if (!str || !substr) return false;

    char* start_of_substring = strstr(str, substr);

    // case2: not in substring
    if (!start_of_substring) return false;

    size_t length_of_string    = strlen(str);
    size_t length_of_substring = strlen(substr);
    size_t index_of_match      = start_of_substring - str;

    // case2: check if at end
    return (length_of_string == length_of_substring + index_of_match);

}

int main(int argc, char* argv[])
{
    char *last_arg = argv[argc-1];
    if (endswith(last_arg, ".py")) {
        // ...
    } 

}

Does this look like it's covering all the cases in an endswith, or am I missing some edge cases? If so, how can this be improved and such? Finally, this isn't a criticism but more a genuine question in writing a C application: is it common that writing C will require 5-10x more code than doing the same thing in python (or is that more because I'm a beginner and don't know how to do things properly?)

And related: https://codereview.stackexchange.com/questions/54722/determine-if-one-string-occurs-at-the-end-of-another/54724

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
David542
  • 104,438
  • 178
  • 489
  • 842
  • 4
    I'm pretty sure you can find proper implementations of endswith on stackoverflow. Your question about "is it normal that it requires 5/10x more code than in python" depends on what you're doing. python has a big run-time library but for instance for an original computation algorithm you could have a 1:1 ratio between C and python – Jean-François Fabre Apr 08 '21 at 19:44
  • Just use one of the definitions in the [codereview.se] question. – Barmar Apr 08 '21 at 19:47
  • 1
    Usually, C code is longer yes. Partially because of the language itself, and partially because it has a huge library for all sorts of stuff that you have to implement from scratch in C. You see the function `argv[-1].endswith('.py')`? Well, someone has written the code for that. You just don't see it. – klutt Apr 08 '21 at 19:48
  • @Barmar sure, but I'm more interested in learning it / getting feedback rather than just copying from that question. – David542 Apr 08 '21 at 19:49
  • Learning a language by trying to reimplement features from completely different languages is not really a good idea. It's often like trying to learn to use a saw as a hammer. C and Python are used for different things. – klutt Apr 08 '21 at 19:51
  • C is a relatively low-level language, with a very simple built-in library of functions. Modern languages like PHP, Python, and JavaScript have enormous libraries of convenience functions like this, as well as higher level data structures like lists, sets, dictionaries. – Barmar Apr 08 '21 at 19:52
  • Note that I'm not saying that you should not do it. It can sure be fun and indeed teach you a bit about the limitations of a language. But be aware that it's a thing you normally don't do. In real coding, you pick a language that suits your needs. – klutt Apr 08 '21 at 19:53
  • 2
    In "case 1" you are checking if one of the string pointers is 0, not if string length is 0 as the comment suggests. Btw., Python `endswith()` will return true if the `substr` is empty. – nielsen Apr 08 '21 at 19:54
  • @nielsen I see, thanks for pointing that out. – David542 Apr 08 '21 at 20:00
  • What @Barmar said. But please note that there are lots of 3rd party libraries you can find. They are just not a part of the standard library. – klutt Apr 08 '21 at 20:00

3 Answers3

4

For starters the function should be declared like

bool endswith(const char* str, const char* substr);

because neither string passed to the function is being changed within the function.

Secondly this if statement

if (!str || !substr) return false;

where you are checking whether at least one pointer is a null pointer is redundant for string functions.

All standard string functions follow the common convention that if the user will pass a null pointer then the function behavior is undefined. That is it is the responsibility of the user of the function to pass non null-pointers.

Thirdly if the call of strstr

char* start_of_substring = strstr(str, substr);

will return a non null pointer it does not mean that the first string ends with the second string or does not ends with the second substring. For example the first string can contain several copies of the second string. In this case your function will return false.

The function can look the following way as it is shown in tje demonstrative program below.

In particularly it is assumed that any string ends with an empty string.

#include <stdio.h>
#include <string.h>
#include <stdbool.h>

bool endswith( const char *s1, const char *s2 )
{
    size_t n1 = strlen( s1 );
    size_t n2 = strlen( s2 );
    
    return ( n2 == 0 ) || ( !( n1 < n2 ) && memcmp( s1 + n1 - n2, s2, n2 ) == 0 );
}

int main(void) 
{
    const char *s1 = "Hello World!";
    const char *s2 = "World!";
    
    printf( "\"%s\" ends with \"%s\" is %s.\n", 
            s1, s2, endswith( s1, s2 ) ? "true" : "false" );
            
    return 0;
}

The program output is

"Hello World!" ends with "World!" is true.
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
  • thanks for this and all the comments: Could you please explain how this part works though? `memcmp( s1 + n1 - n2, s2, n2 ) == 0` – David542 Apr 08 '21 at 20:14
  • @David542 Here we are comparing the last n2 characters (the length of the second string) of the first string with n2 characters of the second string. memcmp is more efficient than strcmp. – Vlad from Moscow Apr 08 '21 at 20:17
  • I see, does the net effect just make the comparison one character less, excluding the ``\0`` ? https://stackoverflow.com/a/13095574/651174 – David542 Apr 08 '21 at 20:24
  • @David542 There is no need to check whether a null character is encountered. – Vlad from Moscow Apr 08 '21 at 20:25
  • Not that it's really relevant with an input of ".py", but in general using `strlen` and `memcmp` is likely going to be faster then character-based comparisons. – Neil Apr 08 '21 at 22:03
3

Does this look like it's covering all the cases in an endswith, or am I missing some edge cases?

You are missing at least the case where the substring appears twice or more, one of the appearances at the end.

I wouldn't use strstr() for this. Instead, I would determine from the relative lengths of the two strings where in the main string to look, and then use strcmp(). Example:

bool endswith(char* str, char* substr) {
    if (!str || !substr) return false;

    size_t length_of_string    = strlen(str);
    size_t length_of_substring = strlen(substr);

    if (length_of_substring > length_of_string) return false;

    return (strcmp(str + length_of_string - length_of_substring, substr) == 0);
}

With regard to that return statement: str + length_of_string - length_of_substring is equivalent to &str[length_of_string - length_of_substring] -- that is, a pointer to the first character of the trailing substring the same length the same length as substr. The strcmp function compares two C strings, returning an integer less than, equal to, or greater than zero depending on whether the first argument is lexicographically less than, equal to, or greater than the second. In particular, strcmp() returns 0 when its argument are equal, and this function returns the result of exactly such a test.

is it common that writing C will require 5-10x more code than doing the same thing in python

Python is a higher-level language than C, so it is common for C code for a task to be lengthier than Python code for the same task. Also, that C blocks are explcitly delimited makes C code a little longer than Python code. I'm not sure that 5-10x is a good estimate, though, and I think that in this case you're comparing apples to oranges. The code analogous to your Python code is simply

int main(int argc, char* argv[]) {
    if (endswith(argv[argc-1], ".py")) {
        // ...
    } 
}

That C has no built-in endswith() function is a separate matter.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • 2
    Your code is the second answer at the linked CodeReview question. – Barmar Apr 08 '21 at 19:50
  • @john -- thanks, could you just please explain the last line, `return (strcmp(str + length_of_string - length_of_substring, substr) == 0)` and how that works? – David542 Apr 08 '21 at 19:53
  • Is this a correct understanding? https://gyazo.com/199c52628d70a4062d65207b40555999 – David542 Apr 08 '21 at 20:01
  • 1
    Yes, @David542, that is a correct understanding. I have updated this answer to explain in a bit more detail. – John Bollinger Apr 08 '21 at 20:08
  • I guess great minds think alike, @Barmar. I wrote this answer without reference to the CodeReview Q&A, but it does seem like the obvious way to do the job. – John Bollinger Apr 08 '21 at 20:09
1

Finally, this isn't a criticism but more a genuine question in writing a C application: is it common that writing C will require 5-10x more code than doing the same thing in python

Sounds a bit much, but it depends on what you do. And yes, usually C code is longer. Partially because of the language itself, and partially because it has a huge library for all sorts of stuff that you have to implement from scratch in C. You see the function argv[-1].endswith('.py')? Well, someone has written the code for that. You just don't see it.

But there are some features that sometimes can make code shorter in C. For instance, in Python assignments are statements. In C, they are expressions. This means that in C, you can do things like:

if(c = foo()) { // Assign c to the return value of foo 
                // and then evaluate it as a Boolean

You could also use the comma operator, like this:

if((c == foo(), ++c) > 4) {

Usually, such constructs are a bad idea. Especially if they are complex. But at least it's examples of how C code sometimes can be shorter.

klutt
  • 30,332
  • 17
  • 55
  • 95