16

Let's say, we create a reimplementation of C, with the only difference being that types are inferred. Storage classes and modifiers would still need to be given (const, static, restrict etc), and let's restrict our attention to single file C programs for the moment. Could it be done? What are the major impediments?

Some thoughts on what might cause problems with type inference

  • structs with the same field name would need to be disambiguated manually
  • same for unions with the same field names
  • casts would probably need a "from" annotation, something like

    var i = (uint32_t -> uint64_t) *some_pointer;
    

These problems would require a bit of user annotation, but shouldn't be too burdensome, is there some killer issue that blows this idea out of the water?

Edit: To clarify, I'm not talking about adding generics or parametric polymorphism, just type inference for existing C types.

Edit 2014: Anyone interested in this concept may want to look into Rust

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
deontologician
  • 2,764
  • 1
  • 21
  • 33
  • I'm probably being slow; can you give an example for your first bullet point? – Oliver Charlesworth Jun 28 '11 at 22:31
  • What you're describing sounds more like a dynamic type system. Type inference is a different idea all together. – Jeff Mercado Jun 28 '11 at 22:32
  • I don't feel I have the expertise to submit a full answer, but my intuition says this is perfectly doable, at least for a C-like language that covers maybe 90%-95% of C. We already have viable type inference for Python (see the ShedSkin project), so I would imagine similar techniques could be used for a type-declaration-free (or nearly so) version of C. – John Y Jun 28 '11 at 22:34
  • Here we go again (JavaScript, PHP, ...): `42 + "3"` is 45 or 423? – pmg Jun 28 '11 at 22:34
  • @Oli Charlesworth: Suppose I have two struct types, `struct Point { int x; int y; }` and `struct Rect { float width; float height; float x; float y; }`. I then have a variable `var some_struct`. When I write `some_struct.x`, what does the compiler do? – Chuck Jun 28 '11 at 22:37
  • 2
    @pmg: Type inference has nothing to do with bizarre automatic type conversions. In fact, it interacts rather badly with them. – Chuck Jun 28 '11 at 22:39
  • 3
    @pmg: In C as it is now, it's undefined behavior for your specific case and pointer addition in the general case. This would not change; the question is not about adding implicit casts, but simply inferring unspecified types. In fact, I'm not sure how anyone is reading duck typing out of this; that's rather more far-reaching than this proposal. – geekosaur Jun 28 '11 at 22:40
  • @Oli: `struct a { int f; }; struct b { char *f; };`. But to the OP, this is not in fact a problem, because you always have to associate the field label with something that has a `struct` type. (`f` in isolation is in a different namespace entirely.) – geekosaur Jun 28 '11 at 22:43
  • @geekosaur that's a good point, since they have a built-in namespace – deontologician Jun 28 '11 at 22:46
  • @pmg could you make a full answer fleshing out your concerns? – deontologician Jun 28 '11 at 22:52

6 Answers6

13

GCC 5.1 supports:

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
  • None of this involves type inference in the sense asked in the question (I believe). C++11 `auto` in particular is extremely trivial, and a far shot from type inference in the sense that people from any other community would usually use that term. – Andreas Rossberg Jul 30 '15 at 00:23
  • @AndreasRossberg Can you give a minimal code example of what such a feature would look like? https://en.wikipedia.org/wiki/C%2B%2B11#Type_inference says C++11 has type inference (I know, wikipedia ... =) ) I'll have a look into rust and Hindley–Milner. – Ciro Santilli OurBigBook.com Jul 30 '15 at 05:55
  • Type inference traditionally describes the situation where type information is not just collected bottom up, but derived from *uses*. Simple example: `succ(n) { return n+1 }`, if you can infer the type of `n` (and thus `succ`) from its usage in an addition. – Andreas Rossberg Jul 30 '15 at 06:34
6

C promotes some types automatically, which complicates things. But to be workable, you need to have a couple additional differences from C as it exists: right now, the standard still supports to some extent legacy K&R C programs, and that requires that unspecified types be handled in particular ways. (See, for example, the rules for function parameters in the absence of prototypes. It also used to be possible to specify functions and variables with no type, and they would be defaulted to (int). (How for variables? Storage class only. static foo;)) All of this legacy type handling would have to be removed before a new implied types mechanism could be added.

geekosaur
  • 59,309
  • 11
  • 123
  • 114
4

To infer types of polymorphic functions would require dramatic extensions to the C type system. Example

length(p) {
  if (p == NULL) return 0;
  else return 1 + length(p->next);
}

This code has to work on any pointer to a struct (or a union) with a next field. You could check out row polymorphism, but at the very least, your new type system is going to have to be far more expressive than the C type system.

Another pressing problem is the overloaded + operation. What type does it default to? Do you want it overloaded at any numeric type, like Haskell or C++? If so, more big extensions to the type system.

The larger lesson is don't do this. The merits of C (as a language, apart from the many fine APIs available in C) are

  • You have total control over the representation of your data.
  • Any person inspecting the source code can easily predict time and space costs.

This agenda isn't really compatible with polymorphism, and polymorphism is a primary benefit of type inference. If type inference is what you want, pick one of the many fine languages (F#, Haskell, ML) that support it natively.

Norman Ramsey
  • 198,648
  • 61
  • 360
  • 533
  • 3
    Well C doesn't allow generics, so in this case (theoretically) you could look at what struct types were in scope that had a field called next, and if there were two or more, require an annotation. – deontologician Jun 29 '11 at 01:11
  • IMHO, for *C itself* to remain a workable language, it needs to add new ways of declaring types that would allow a programmer to distinguish "number whose value is from 0 to 4294967295" from "member of the algebraic ring of values congruent mod 4294967296". Adding the former to any other kind of number should promote both to the most convenient size large enough to hold both; adding a number to the latter should yield a result of the same algebraic ring type. – supercat Oct 23 '14 at 16:01
  • There's no good reason why a processor with 64-bit registers shouldn't promote computations involving numbers to use the full 64 bits, but on the flip side, a lot of code will break if `uint32_t` suddenly ceases to behave as an abstract algebraic ring. Adding a means of declaring something to be of type (hypothetical syntax) `unsigned[int 32]` when the former is appropriate and `unsigned [restrict 32]` when the latter is needed could greatly improve code robustness and machine-independence while reducing the need for typecasts. – supercat Oct 23 '14 at 16:05
  • BTW, with regard to "complete control of data", that's horribly far from being the case, though some small additions to the language (like those above) would 99.9% fix the problem. A further enhancement I'd like to see would be to allow types to specify storage layout [e.g. have `unsigned [restrict uint8_t (0:8,8:8,16:8,24:8)]` indicate that the 32-bit value must be stored using four `uint8_t` values of 8 bits each, LSB first, *regardless of the machine's word size*; if that matches the machine's word order, great; if not, and if the type is used in a `union` or... – supercat Oct 23 '14 at 16:13
  • ...its address is taken, the compiler would be required to use whatever sequence of shifts and stores would be necessary to handle the specified format. Not a huge change to the language, but it would allow many programs to be written in such a way as to be more concise, robust, and performant (if a big-endian platform has a swap-bytes-in-words instruction, a compiler for that platform could use it when loading/storing the above integer type more easily than if the type was read with code like `value=(p[0] | (p[1]<<8) | (p[2]<<16) | (p[3]<<24))`) – supercat Oct 23 '14 at 16:21
1

It is possible to do some type inference in C. Take a look into this tool: http://cuda.dcc.ufmg.br/psyche-c. You can type part of a program there, and it will reconstruct the missing type declarations. For instance, if we feed it with a variation of Norman's program:

int length(T p) {
  if (p == NULL) return 0;
  else return 1 + length(p->next);
}

Then psyche-c finds these declarations:

#include <stdint.h>
#define NULL ((void*)0)
typedef int bool;
bool false = 0;
bool true = 1;
typedef  struct T {struct T* next;}* T;

This kind of type reconstruction is useful for code completion, for instance.

1

C has a complicated set of rules regarding type promotion and conversion that people find confusing even when they can actually see the types they're dealing with. I suspect that even if C had type inference, it would be necessary to declare types to prevent ridiculous mistakes every third function.

Chuck
  • 234,037
  • 30
  • 302
  • 389
0

Some situations where, I think, inferred types could not be used:

  • pointers need a definite type
  • arrays need a definite type
  • function parameters need a type

I think it could work for a few cases, mostly simple local variables.

Even something as simple as calculating a checksum needs a definite type

crc8 = 0x00; /* 8 bits; cf uint8_t crc8 = 0; */
crc32 = 0x00000000; /* 32 bits; cf uint32_t crc32 = 0; */

It could be done, for limited use.

pmg
  • 106,608
  • 13
  • 126
  • 198
  • If by "definite" you mean "not inferred", why do pointers need such a type? `some_function() { return "foo"; } ... p = some_function()`. Now `p` has inferred type `const char*`, or `char*` if we prefer a foolish similarity to C. What's the problem? Of course `malloc` doesn't give us anything to infer a type from, so presumably either some casts would remain, or else some differences from C would be needed. – Steve Jessop Jun 29 '11 at 00:08
  • That's what I mean; not inferred. And for function parameters I mean, for example: in `function foo(variable) { /*...*/ }` the type of the parameter cannot be inferred. – pmg Jun 29 '11 at 08:07
  • In most cases, the type of the parameter can be inferred from the way it's used. For example, in `foo(variable) { printf(variable, 123); }` we can see that variable must be a type which is valid as the first argument of `printf`. If we don't do anything that narrows down the required type of `variable` in `foo`, then the type inference engine can assume that it doesn't matter what the type is. Admittedly, there are places where the programmer will need to "help" the compiler, but it's certainly not every place. – RHSeeger Jun 29 '11 at 14:27