-1

I'm trying to write a function to parse and extract the components of a URL. Moreover, I need the components (e.g. hostname) to have the type char * since I intend to pass them to C APIs.

My current approach is to save the components in the parse_url function to the heap by calling malloc. But for some reason, the following code is printing gibberish. I'm confused by this behavior because I thought memory allocated on the heap will persist even after the function allocating it returns.

I'm new to C/C++, so please let me know what I did wrong and how to achieve what I wanted. Thank you.

#include <iostream>
#include <string>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

using namespace std;

void cast_to_cstyle(string source, char *target)
{
    target = (char *)malloc(source.size() + 1);
    memcpy(target, source.c_str(), source.size() + 1);
}

void parse_url(string url, char *protocol_cstyle, char *hostname_cstyle, char *port_cstyle, char *path_cstyle)
{
    size_t found = url.find_first_of(":");
    string protocol = url.substr(0, found);

    string url_new = url.substr(found + 3); // `url_new` is the url excluding the "http//" part
    size_t found1 = url_new.find_first_of(":");
    string hostname = url_new.substr(0, found1);

    size_t found2 = url_new.find_first_of("/");
    string port = url_new.substr(found1 + 1, found2 - found1 - 1);
    string path = url_new.substr(found2);

    cast_to_cstyle(protocol, protocol_cstyle);
    cast_to_cstyle(hostname, hostname_cstyle);
    cast_to_cstyle(port, port_cstyle);
    cast_to_cstyle(path, path_cstyle);
}

int main() {
    char *protocol;
    char *hostname;
    char *port;
    char *path;
    
    parse_url("http://www.example.com:80/file.txt", protocol, hostname, port, path);

    printf("%s, %s, %s, %s\n", (protocol), (hostname), (port), (path));
    
    return 0;
}
nobody
  • 19,814
  • 17
  • 56
  • 77
Alvin
  • 59
  • 1
  • 6
  • 1
    Nitpick: as this is C++, you should be including `cstdio`, `cstring` and `cstdlib` instead of `stdio.h`, `string.h` and `stdlib.h`. – Chris Sep 19 '21 at 23:35
  • 5
    This `target = (char *)malloc(source.size() + 1);` overwrites the function argument. After the function `cast_to_cstyle()` returns, the pointer is forgotten and you have a memory leak. – Weather Vane Sep 19 '21 at 23:37
  • 1
    Either introduce an other level of indirection to the argument, or change the function to be `char *cast_to_cstyle()` and `return target`. – Weather Vane Sep 19 '21 at 23:38
  • 2
    When you call a function like `void cast_to_cstyle(string source, char *target)` the parameter `target` is a copy of the pointer that was passed to the function. If the function body changes where `target` points it is changing the copy that was made and is NOT changing the original pointer that was passed in. If you want to change the original pointer that was passed in you either need to pass a pointer to it `void cast_to_cstyle(string source, char **target)` or, like Weather Vane said, return the new pointer instead. – Jerry Jeremiah Sep 19 '21 at 23:46
  • @kaylum This is not C code. It is one file and it uses the C++ STL. It cannot be compiled by a C compiler. The C tag is inappropriate. Do not re-add it. – nobody Sep 19 '21 at 23:47
  • **None** of the pointers in your `main()` function are ever initialized to any value. You create them (uninitialized), and then pass **copies** of the uninitialized values to other functions. It's time to learn about [reference parameters](https://stackoverflow.com/questions/2564873/how-do-i-use-reference-parameters-in-c). – Drew Dormann Sep 20 '21 at 00:00
  • @nobody I initially took it out. But then saw the OP say "I intend to pass them to C APIs". So to give the benefit of the doubt I added it back in. – kaylum Sep 20 '21 at 00:05
  • Looks like you may have a fundamental misunderstanding about what it means to assign a new value to one of your arguments. In particular, this may be conflated with your understanding of how a "pointer to char" works as a string. You seem to have assumed that assigning a new value to `target` will somehow change the value of the variable that was passed into the `cast_to_cstyle` function. This just ain't so. – Wyck Sep 20 '21 at 00:19
  • 1
    You can remove the whole method and replace its calls with with `char *target = strdup(source.c_str());`. – user207421 Sep 20 '21 at 00:42

2 Answers2

1

The problem is that arguments are passed by value, so the newly created string never leaves the function (albeit exists until program termination as free is never called on it). You can pass by reference¹ like:

void cast_to_cstyle(string source, char *&target)

or better, pass the source string by (constant) reference too (string is expensive to copy):

void cast_to_cstyle(const string &source, char *&target)

(neither function body nor the call site need to be changed).

But you may not need even that.

  1. If the API doesn’t actually modify the string despite using non-const pointer (pretty common in C AFAIK), you can use const_cast, like const_cast<char *>(source.c_str()).

  2. Even if it may modify the string, &source[0] is suitable (at least since C++11). It may not seem right but it is:

a pointer to s[0] can be passed to functions that expect a pointer to the first element of a null-terminated (since C++11)CharT[] array. — https://en.cppreference.com/w/cpp/string/basic_string

(and since C++17 data() is the way to go).

However, unlike that obtained from malloc any such pointer becomes invalid when the string is resized or destroyed (be careful “the string” means “that particular copy of the string” if you have several).

¹ Strictly speaking, pass a reference; references aren’t restricted to function arguments in C++.

numzero
  • 2,009
  • 6
  • 6
0

The problem was as @WeatherVane and @JerryJeremiah mentioned. The pointer returned by malloc and assigned to target was in the local scope of cast_to_cstyle(), which got destroyed after the function returns. So the protocol, hostname, port, path variables declared in main were never assigned, hence it printed out gibberish. I've fixed this by making the cast_to_style() returns a char *.

char *cast_to_cstyle_str(string source)
{
    char *target = (char *)malloc(source.size() + 1);
    memcpy(target, source.c_str(), source.size() + 1);
    return target;
}

Note: I forgot to free up malloc in my question.

Alvin
  • 59
  • 1
  • 6
  • 2
    You can remove the whole method and replace its calls with with `char *target = strdup(source.c_str());`. – user207421 Sep 20 '21 at 00:42