1

I have a bunch of C files that try to read and write CSV and other random data to and from disk using stdio functions like fread(), fwrite(), fseek(). (If it matters, it's for a university assignment where we are to experiment with IO performance using different block sizes, and various structures to track data on disk files and so on.)

What I wanted to do was compile these source files (there are dozens of them) without the definitions for fopen(), fread(), fwrite() that come from <stdio.h>. I want to supply my own fopen(), fread(), fwrite() where I track some information, like which process tried to read which file, and how many blocks/pages where read and things like that, and then call the normal stdio functions.

I don't want to have to go through every line of every file and change fopen() to my_fopen() .... is there better way to do this at compile time?

I am half way working on a Python program that scans the source files and changes these calls with my functions but it's getting a bit messy and I am kind of lost. I thought maybe there is a better way to do this; if you could point me in the right direction, like what to search for that would be great.

Also I don't want to use some Linux profiling stuff that reports which syscalls where made and what not; I just want to execute some code before calling these functions.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278

4 Answers4

2

An alternative to the LD_PRELOAD trick (which requires you to write a separate library and works only on Linux) you can use the --wrap option of the GNU linker. See here for an example of this technique.

Main differences with LD_PRELOAD:

  • no external library needed - it's all in the executable;
  • no runtime options needed;
  • works on any platform as long as you are using the GNU toolchain;
  • works only for the calls that are resolved at link time - dynamic libraries will still use the original functions
Community
  • 1
  • 1
Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
1

No but yes but no. The best way I know of is to LD_PRELOAD a library that provides your own versions of those functions. You can get at the originals by dlopening libc.so (the dlopen NULL trick to get at libc functions isn't applicable here because your library will have already been loaded).

Community
  • 1
  • 1
hobbs
  • 223,387
  • 19
  • 210
  • 288
1

One way of doing it is by redefining all the stdio functions you need. fopen becomes my_fopen, fread becomes my_fread, then have your my_fopen call fopen. This can be done in a header file that you include in the files where you want to replace the calls to fopen. See example below.

main.c:

#include <stdio.h>
#include "my_stdio.h"

int main(void)
{
  FILE *f;
  char buf[256];
  f = fopen("test.cvs", "r");
  if(f == NULL)
  {
      printf("Couldn't open file\n");
      return 1;
  }
  fread(buf, sizeof(char), sizeof(buf), f);
  fclose(f);
  return 0;
}

my_stdio.c:

#include <stdio.h>

FILE *my_fopen(const char *path, const char *mode)
{
  FILE *fp;
  printf("%s before fopen\n", __FUNCTION__);
  fp = fopen(path,mode);
  printf("%s after fopen\n", __FUNCTION__);
  return fp;
}

int my_fclose(FILE *fp)
{
  int rv;

  printf("%s before fclose\n", __FUNCTION__);
  rv = fclose(fp);
  printf("%s after fclose\n", __FUNCTION__);
  return rv;
}

size_t my_fread(void *ptr, size_t size, size_t nmemb, FILE *stream)
{
  size_t s;

  printf("%s before fread\n", __FUNCTION__);
  s = fread(ptr,size,nmemb,stream);
  printf("%s after fread\n", __FUNCTION__);
  return s;
}

size_t my_fwrite(const void *ptr, size_t size, size_t nmemb, FILE *stream)
{
  size_t s;

  printf("%s before fwrite\n", __FUNCTION__);
  s = fwrite(ptr,size,nmemb,stream);
  printf("%s after fwrite\n", __FUNCTION__);
  return s;
}

my_stdio.h:

#ifndef _MY_STDIO_H_
#define _MY_STDIO_H_

#define fopen my_fopen
#define fclose my_fclose
#define fread my_fread
#define fwrite my_fwrite

#endif /* _MY_STDIO_H_ */

Makefile:

main: main.o my_stdio.o
    $(CC) -g -o $@ main.o my_stdio.o

main.o: main.c
    $(CC) -g -c -o $@ $<

my_stdio.o: my_stdio.c my_stdio.h
    $(CC) -g -c -o $@ $<
Puppe
  • 4,995
  • 26
  • 27
  • 1
    Bad idea to `gcc -Dfopen=my_fopen`, the `fopen` token inside `` and `` will be replaced by `my_fopen`. A better idea is to `#define fopen my_fopen` inside `my_stdio.h` which should always be included after `` – Basile Starynkevitch Nov 18 '15 at 06:42
  • But that would require changes to all files that uses stdio.h. Why would replacing fopen to my_fopen in stdio.h be a problem? The code will call my_fopen anyway? – Puppe Nov 18 '15 at 06:54
  • 1
    @Puppe: having seen the macro machinery at work in many standard headers, I wouldn't be comfortable replacing anything in them, and surely not names of standard functions. – Matteo Italia Nov 18 '15 at 06:58
  • I see your points, have updated my example to do the redefinition in my_stdio.h instead. – Puppe Nov 18 '15 at 07:35
0

Another way: Add -Dfread=my_fread to the Makefile CFLAGS for any .o files you wish to "spy" on. Add in my_fread.o that defines my_fread [which has no -D tricks].

Repeat the above for any functions you wish to intercept. About the same as the LD_PRELOAD [in terms of effectiveness and probably easier to implement]. I've done both.

Or create a my_func.h that does the defines and insert a #include "my_func.h" in each file. Dealer's choice

UPDATE

Forgot about another way. Compile normally. Mangle the symbol names in the target .o's [symbol table] (via a custom program or ELF/hex editor): Change fread into something with the same length that doesn't conflict with anything [you can control this]. Target name: qread or frea_ or whatever.

Add your intercept .o's using the new names.

This might seem "dirty", but what we're doing here is a "dirty" job. This is an "old school" [sigh :-)] method that I've used for .o's for which I didn't have the source and before LD_PRELOAD existed.

Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • 1
    "About the same as the LD_PRELOAD" nope, it's a completely different technique (brutal text substitution vs changing the way the loader works) with different applicability and different drawbacks. Saying they are about the same is seriously misleading. – Matteo Italia Nov 18 '15 at 06:46
  • @MatteoItalia I don't think that's quite what he meant to imply. Anyway it's a valid approach, and simpler in some ways. Biggest caveat is that it won't intercept function calls made by libraries, unless they were also rebuilt. – hobbs Nov 18 '15 at 06:50
  • @MatteoItalia In terms of effectiveness they are about the same. The "brutal" text substitution is actually quite easy. If you do LD_PRELOAD, you have to do `dlopen` and `dlsym` [in your intercept function] which actually takes more work and you have to maintain state information. BTW, somebody else proposed the same solution and got the accepted answer. – Craig Estey Nov 18 '15 at 06:59