Refu Language Reference Manual¶
Built-in data types¶
The following data types are built-in. Some of them correspond to the data types defined by the C99 standard in <stdint.h>
but they follow the same naming scheme as in rust.
- Unsigned numbers
uint
: compiler will decide the size depending on contextu8
: 8 bit unsigned integer, corresponding touint8_t
u16
: 16 bit unsigned integer, corresponding touint16_t
u32
: 32 bit unsigned integer, corresponding touint32_t
u64
: 64 bit unsigned integer, corresponding touint64_t
- Signed numbers
int
: compiler will decide the size depending on contexti8
: 8 bit signed integer, corresponding toint8_t
i16
: 16 bit signed integer, corresponding toint16_t
i32
: 32 bit signed integer, corresponding toint32_t
i64
: 64 bit signed integer, corresponding toint64_t
- Real numbers
f32
: corresponds to binary32, single precision floating point, as defined by IEE 754-2008f64
: corresponds to binary64, double precision floating point, as defined by IEE 754-2008
- Strings
string
: UTF-8 encoded unicode string.string8
: Ascii encoded string
- Other
bool
: A boolean true or false valuenil
: the unit type, also known as NULL
Note
string8
type is currently not implemented.
Alebraic Data Types¶
More complex data types can be defined by the user as Algrebraic data types. This is achieved with the type
keyword.
type person {
name:string, age:int |
name:string, age:int, surname:string
}
type list {
nil | (load:int, tail:list)
}
type foo {
a:int,
b:(string|float)
}
type foo {
a:int,
b:(string | (i:int, f:f32))
}
Above we have the definition of a person and a list. A person has a name and an age and optionally a surname. And a list is either empty (denoted by the nil
keyword) or it has a load of an int
and a tail which is another list.
Recursive Data Types¶
TODO: Describe recursive ADTs
Anonymous Data Types¶
An algebraic data type can be considered as the equivalent of a tagged union type in C. Refu also supports anonymous ADTs. That means, you can encounter the ADT syntax without it having been defined. For example, a function’s argument can be an anonymous ADT.
fn print_me(a:(string | b:int, c:int))
{
//do some initialization stuff
...
//and now do the pattern matching
match a {
string => print(a)
int, int => print(a.b, a.c)
}
}
fn print_me(a:string | (b:int, c:int)) -> int
_ => {
print("%s", a)
print("one argument")
}
_, _ => {
print("%d %d", b, c)
print("two arguments")
}
Above we have one function with an anomymous ADT. If such a function exists then it must have a match expression somewhere inside its body in order to distinguish what kind of input it is having before this input is used. The most explicit way to achieve this is to write the match expression explicitly as in example 1. To do that we match the keyword fn
inside the function’s body against the various cases.
In another case if the function body consists only of different branches depending on the input we can omit the function’s body block completely and go with the way that example 2 is defined, which resembles a lot the way functions are defined in haskell. It is just syntactic sugar for achieving the same thing as in example 1.
Instantiating objects¶
In order to construct an instance of a data type you have to use one of its constructors. A constructor of an object is simply defined as any of its sum type operands.
a:person = person("steven", 23)
b:person = person("celina", 22, "wojtowicz")
For ease of use, arguments can also be given to a constructor as keyword arguments. If one keyword argument is passed to a constructor then all arguments should be keyword arguments. Finally when passing keyword arguments the order of the arguments does not matter as opposed to when calling a constructor normally.
a:person = person(name="steven", age=23)
b:person = person(name="celina", surname="wojtowicz", age=23)
Note
Keyword arguments are currently not implemented
Instantiating Recursive data types¶
Note
Recursive data types are currently not implemented
Data types can also be recursive. This is how we can define collections in Refu. But how do you construct a collection?
a:list = nil
b:list = list(1, 2, 3, 4, 5)
c:list = list(1, list( 2, list(3, list(4, list(5, nil)))))
In the above examples list b
and list c
are equal. The canonical way to define a list would be exactly like list c
is defined, having /1/ as its first element and using nil
after 5
to denote the list’s end.
As we can see above to construct a recursive data type we still use a constructor but we can take advantage of the fact that the type is recursive in order to construct it.
In the case of b
‘s construction Refu knows that a list’s constructor can only accept an int and a next list pointer. Using that knowledge it can expand the list(1, 2, 3, 4, 5)
to list(1, list(2, list(3, list(4, list(5, nil)))))
.
Same thing can work for more complex recursive data types such as a binary tree. Look below for an example.
type binary_tree {
nil | load:int, left:binary_tree, right:binary_tree
}
a:binary_tree = nil
b:binary_tree = binary_tree(8, (4, (1, 7)), (12, (10, 19)))
c:binary_tree = binary_tree(
8,
binary_tree(4,
binary_tree(1, nil, nil), binary_tree(7, nil, nil)),
binary_tree(12,
binary_tree(10, nil, nil), binary_tree(19, nil, nil)))
Array Types¶
Array types are like simple C arrays that are aware of their own size so as to make sure there is no out of bounds access. An array is simply a contiguous block of memory containing values of the same type.
array_of_ints:int[20]
array_of_strings:string[20]
a:int = array_of_ints[5]
b = array_of_ints[5] // type deduction
c:int = array_of_ints[22] //compile error
Dynamic size arrays can also be instantiated with the built-in make_arr(type, elements_number)
function. An array’s size in elements can be queried by array.size
.:
fn foo(b:u8[]) {
b[3] = 16;
}
buffer:u8[] = make_arr(u8, 10)
foo(buffer)
printf("%d", buffer.size); // should print 10
printf("%d", buffer[3]); // should print 16
Note
Dynamic arrays and their instantiation is currently not implemented
Types and Conversion¶
All elementary types can be converted from and to another type. Type conversion can either be explicit or implicit.
Implicit Conversion¶
Implicit conversion happens when you simply assign a value of one type to a value of another type for which conversion is legal. It can also happen during almost all other parts of the code like in a function call, in a constructor e.t.c.
The implicit conversion rules for elementary types can be seen in the following table where:
- OK -> conversion is allowed
- NO -> conversion is not allowed
- WC -> conversion produces a warning for a value of type and error for a constant of type since then we are sure that data is going to be lost. Only works for assignments now.
from/to | i8 | u8 | i16 | u16 | i32 | u32 | i64 | u64 | f32 | f64 | string | bool | nil |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
i8 | OK | WC | WC | OK | WC | OK | WC | OK | OK | OK | NO | OK | NO |
u8 | OK | OK | OK | OK | OK | OK | OK | OK | OK | OK | NO | OK | NO |
i16 | WC | WC | OK | WC | OK | WC | OK | WC | OK | OK | NO | OK | NO |
u16 | WC | WC | OK | OK | OK | OK | OK | OK | OK | OK | NO | OK | NO |
i32 | WC | WC | WC | WC | OK | WC | OK | WC | OK | OK | NO | OK | NO |
u32 | WC | WC | WC | WC | OK | OK | OK | OK | OK | OK | NO | OK | NO |
i64 | WC | WC | WC | WC | WC | WC | OK | WC | OK | OK | NO | OK | NO |
u64 | WC | WC | WC | WC | WC | WC | OK | OK | OK | OK | NO | OK | NO |
f32 | NO | NO | NO | NO | NO | NO | NO | NO | OK | OK | NO | NO | NO |
f64 | NO | NO | NO | NO | NO | NO | NO | NO | OK | OK | NO | NO | NO |
string | NO | NO | NO | NO | NO | NO | NO | NO | NO | NO | NO | NO | NO |
bool | OK | OK | OK | OK | OK | OK | OK | OK | NO | NO | NO | OK | NO |
nil | NO | NO | NO | NO | NO | NO | NO | NO | NO | NO | NO | NO | NO |
From the table we can understand that the general idea is that:
- All int types can be converted to each other except for:
- signed to unsigned, which produces a warning and fails for constants
- large to smaller, which produces a warning and fails for constants
- All int types can be converted to booleans
- All int types can be converted to floats
- Floats can be converted to each other
- Bool can be converted to integer types
There is one major exception to the above rules and that is pattern matching. In patern matching no implicit conversions are allowed, except for smaller sized integers of the same signednes towards bigger sized integers.
Below are some examples for assignments:
a:u8 = 128
b:u32 = a // implicit conversion allowed
c:u8 = b // implicit conversion will produce a warning. Larger to smaller.
d:u8 = 65535 // implicit conversion will fail. Constant is of larger type
f:i8 = 64
e:u8 = f // implicit conversion will produce a warning. Signed to unsigned.
g:u8 = -64 // implicit conversion will fail. Signed constant to unsigned variable.
h:i64 = true // implicit conversion allowed. h == 1
f:bool = 2245 // implicit conversion allowed. f == true
j:bool = 0 // implicit conversion allowed. j == false
As far as binary operators are concerned the result of an operation to elementary type is valid if one of the operands can be converted to each other. If that is the case the type of the operation is the type of the larger type.
a:u64 = 653432431
b:u16 = 2324
c:f32 = 3.14
d:string = "abc"
a + b // valid u16->u64, type will be u64
b + c // valid u16->f32, type will be f32
c + d // invalid
A type can be implicitly converted to a sum type by succesfull conversion to either of its sum operands. For example:
fn foo (a:u64 | b:string) { }
fn main() {
foo(45)
foo("eleos")
}
Only one implicit conversion is allowed per type comparison. Continuing from the above example we can’t have:
fn foo (a:u64 | b:string) { }
fn main() {
foo(true)
}
That’s because this would require two different implicit conversions.
Explicit Conversion¶
Explicit conversions allow for quite a bit more freedom for converting between types. An explicit conversion is achieved with a function call to the name of the type. Much like a constructor of a user defined type, which itself could be thought of as a sort of a type conversion.
All integer types can be converted to each other with explicit conversions, except for constants that would obviously cause loss of data.
a:u64 = 542312
b:i16 = i16(a) // valid explicit conversion, would give warning as implicit
c:u16 = u16(b) // valid explicit conversion, would give warning as implicit
d:u8 = u8(-13) // invalid explicit conversion, obvious loss of data
e:u8 = u8(65535) // invalid explicit conversion, obvious loss of data
Floats can be explicitly converted to ints.
a:f32 = 3.2313;
b:f64 = 123.231233;
c:u32 = u32(a); // valid explicit conversion, would give warning as implicit
d:u16 = u16(b); // valid explicit conversion, would give warning as implicit
An interesting case is explicit conversion to string. Explicit conversion to string is allowed but only for integer/floating constant and booleans.
a:u32 = 2321;
b:string = string(2313) // valid conversion
c:string = string(a) // invalid conversion, not constant
d:string = string(23.231) // valid conversion
e:bool = (3 == 4)
e:string = string(e) // valid conversion
If Expressions¶
In Refu an if
can act either as an expression or like a statement depending on the context. That means, that you can assign an if expression as values to variables. The general syntax is as follows:
if i > 10 {
increase_a_value()
compress_a_file()
} elif i < 0 {
do_something_else()
} else {
do_last_thing()
}
The above if
acts as a statement since it is not in the right side of any kind of assignment. But observe below another example usage where if
is used as an expression. Depending on the value of if
, we assign a specific value to a
.
a:int
a = if i > 10 {
20
} elif i < 0 {
40
} else {
100
}
Unlike some other languages the curly braces can’t be omitted in any branch of the if. If the condition of an if branch is complex enough then it should be enclosed in parentheses like below.:
if ((i > 10 && i <20) || (x > 30 && x < 40)) {
do_something()
}
For Expressions¶
The simplest way to iterate something in refu is by using a for
expression. The syntax is simple. For a simple iteration over a range of integers you can use the following.
for i in 0:10 {
//this will iterate 10 times, with i ranging from 0 to 9
do_something()
}
The simple iteration syntax is for
identifier in
iterator. Iterator can either be a range in the form of start:step:end
or a collection. For ranges the step is optional and is shown in the next example. By default step
is 1
. Step can also be negative.
for i in 0:2:10 {
//this will print 0, 2, 4, 6, 8
print(i)
}
for i in 10:-2:0 {
//this will print 10, 8, 6, 4, 2
print(i)
}
For expressions are also heavily customizeable on a per type basis. By deriving the standard library’s iterator typeclass you can define how the expression behave for a specific type. For example:
type list {
nil | payload:int, tail:list
}
instance std::iterator<list> {
fn(self:list)->list
{
match(self) {
(nil) => return nil
(val, tail) => return (val, tail)
}
}
}
my_list:list = (1, 2, 3, 4, 5)
for i in my_list {
//this should print all the values of the list.
print(i)
}
Note
The iterator typeclass and the surrounding behaviour of defining your own iterator behaviour is not yet implemented.
By defining the list_iter
instance of the iterator typeclass we just defined the way that lists can be iterated. Afterwards whenever a for
expression is used on a list, the defined implementation is used. The iterator typeclass looks like this:
class iterator <type T> {
fn(self:T) -> (nil | (Any, T))
}
So, all implementations need to do is define the value at each iteration, the next object of the iteration and the condition under which the iteration terminates. The function must return either nil
to denote the end of the iteration, or a value of type T
and the next object for iteration.
Finally, for
expressions can also be assigned. For example an array can be assigned like this:
arr:int[3] = [5, 6, 7]
another_arr:int[] = for i in arr { i + 3 }
another_arr
will contain [8, 9, 10]
. Ofcourse these expressions are checked at compile time for validity of type assignment. If the for block had something that is not an int, or if it had more statements then it would be a compile error. On the left hand of the assignment any identifier whose type would agree with (nil | int, T)
would be acceptable.
Pattern Matching¶
Algebraic data types go hand in hand with the ability to use pattern matching on those types in order to deconstruct them. This is offered by the match
expression keyword in refu.
Pattern matching is the elimination construct for algebraic data types. That means that a pattern matching expression, expresses how one should consume a partciular ADT. For example look below.
type list {
nil | load:int, tail:list
}
a:list
match a {
nil => print("empty list")
i, _ => print("Head of the list is %d", i)
}
Match expressions can also be recursive. A match()
inside a match expression renders the whole match recursive. For example look at the matching below which calculates the length of a list.
fn find_length(a:<list>) -> int
{
return match a {
nil => 0
_, tail => 1 + match(tail)
}
}
For completeness sake it should be noted that the above example can be written in a simpler way, having the function block omitted:
fn find_length(a:list) -> int
nil => 0
_, tail => 1 + find_length(tail)
In a match
, all possible value combinations must be exhausted. _
means any value, nil
means no value and anything else is interpreted as an identifier to recognize that particular positional argument. Another way to match something would be depending on the type. For example.
type list <T> {
nil | (load:T, tail:list)
}
a:list<int> = list<int>(1, 2, 3)
list_type:string = match a {
nil => "empty list"
int, _ => "list of ints"
_ => "other kind of list"
}
From the above, one can notice the following. A match expression is just that, an expression and can as easily be assigned to something. Finally it is a compile error to not exhaust all possible matches, so the _
at the end matches all other cases.
type foo {
a:i16 | b:u16 | c:string | d:bool
}
a:foo
match a {
string | bool => "not a number"
a:(i16 | u16) => 5 + a
}
Another way to define patterns is by using the type operators. As can be seen above one can combine possible type reductions using the same operators we use when a type is defined.
Memory Model¶
Note
As of the moment of writting the memory model is still unimplemented and there are ideas floating around which need to be solidified. Some of these ideas are presented here.
TODO
Functions¶
Functions are declared in Refu just like in the Rust language. The keyword fn
followed by the name of the function, the arguments and finally by an arrow pointing to the return value.
Return value¶
As mentioned, whatever follows ->
is the function’s return value. If there is no return value then the arrow is omitted. Some examples follow:
fn add_two_ints(a:int, b:int) -> int
{
return a + b
}
fn print_something()
{
print("something")
}
Inside the function’s body a return
statement denotes the expression that determines the return value. A function may return a value but still need no return statement if it’s compact enough and has all its functionality under a match
, if
or for
expression. For example:
fn int_inside_range(x:int, from:int, to:int) -> bool
{
if (x >= from && x <= to) { true } else { false}
}
In the absense of a return value the function’s last expression statement
value is interpreted as the return value. For example the following function’s
return value is determined by a + 1
.
fn do_something(a:int) -> int
{
a = a * 2
if (a > 10) {
a - 5
} else {
a - 1
}
a + 1
}
Moreover a function can also completely omit a body block if it has a match
expression on its arguments like below:
fn find_length(a:~list) -> int
(nil) => 0
(_, tail) => 1 + find_length(tail)
Argument Evaluation Strategy¶
The argument evaluation strategy is pass by value for all elementary types, except for string
. That means that they are copied inside the function. Objects of all user defined types are passed by reference, which means that simply a pointer to the object is passed along in the function.
Type Parameters¶
Refu supports type parameters, which syntactically look like generics of other programming languages. Their use will be seen heavily in the use of typeclasses below but first let’s take a look at the syntax.
type list <type T> {
nil | payload:T , tail:list
}
..
..
a:list<int> = (5, 6, 7, 8)
This would define a generic ADT list, and later the user declares a list of ints and populates it. Same thing can be done with an ADT binary tree.
type binary_tree <type T> {
nil | payload:T , left_branch:uptr<binary_tree>, right_branch:uptr<binary_tree>
}
...
...
/*
1.0
/ \
0.1 2.0
/ \ / \
0.01 0.2 1.5 3.3
*/
a:binary_tree<double> = ( 1.0, (0.1, (0.01), (0.2)), (2.0, (1.5, 3.3)))
a:binary_tree<double> = (1.0, cons(0.1, cons(0.01, Nil), cons(0.2, Nil) ), cons(2.0, cons(1.5, Nil), cons(3.3, Nil)))
Type parameters can be of either a concrete type as designated by type
or by a type of a type also known as a kind
. We will read more about kinds in the corresponding section.
Note
Type Parameters are only implemented during the parsing stage of the compiler and are not yet ready.
Kinds¶
Note
Kinds are not implemented whatsoever and their usage in the language is not yet well thought out.
Kinds are essentially the types of types. In Kind syntax a concrete type is represented as *
. The syntax of kinds is similar to that of types. A *
represents any concrete type.
To understand kinds picture them as the types of a type constructor (generic type). For example the Maybe
type.
type maybe<type T> {
T | nil
}
has a kind of *->*
which means that it takes one concrete type and derives another concrete type.
Typeclasses¶
Note
Typeclasses are only implemented in the parsing stage of the language at the moment. But they are high in the priority list so they should be ready soon.
Refu relies heavily on the use of typeclasses. They are an important way to guarantee behaviour about objects of a given type. There are quite a few builtin typeclasses in the standard library. The concept of a typeclass is similar to that of an interface in some other languages.
Simple Example¶
Here is one example which defines the operation of the adding operator. This allows an object to define how it shall be added. One can notice the keyword self
which defines the object the function will be called for and also the generic syntax of <type T>
since we can’t know the type of the object we are adding.
class addition <type T> {
fn add(self:T, other:T) -> T
}
type vector {
x:int, y:int, z:int
}
//A type would declare that it derives the typeclass
instance addition<vector> {
fn add(self:T, other:vector) -> vector
{
ret:vector
ret.x = self.x + other.x
ret.y = self.y + other.y
ret.z = self.z + other.z
return ret
}
}
So what the above code declares is that there is some type called vector
. That type is an instance of the addition typeclass with the given implementation. The addition typeclass like some other special typeclasses allow for special operations. In particular it allows for overloading operator +
. So adding two vectors would in essence call the instance of the typeclass.
Note that self
can be omitted from the arguments of a typeclass and its type instance if no special evaluation/ownership strategy needs to be used. It will always be implied.
Advanced example - Iteratable Collections¶
Another example of a typeclass would be a class of types that can be iterated. All collection types should be iterateable so the following typeclass definition makes sense.
class iteratable <collection c> {
fn iterate(self:collection, cb:function)
}
And below we can see a nice example for how an instance of this typeclass would be implemented.
type array <type T> {
T[]
}
instance iteratable <array, type T> {
fn iterate(self:T, cb:function)
{
for i in range(0:self.length) {
cb(self[i])
}
}
}
Typeclass Inheritance¶
The use of typeclasses is extended by the possibility of inheritance between typeclasses.
class equality <type T> {
fn equals(self:T, other:T) -> bool
fn nequals(self:T, other:T) -> bool
}
class comparison <type T> extends equality{
fn greater_than(self:T, other:T) -> bool
fn less_than(self:T, other:T) -> bool
}
class super_comparison <type T> extends comparison{
fn gteq(self:T, other:T) -> bool
fn lteq(self:T, other:T) -> bool
}
//multiple inheritance
class reader <type T> {
fn read(a:T)
}
class writer <type T> {
fn write(a:T)
}
class io <type T> extends (reader, writer) {
// can be empty or can have additional functions to implement
}
The typeclass equality above allows for types that instantiate it to use its 2 equality functions, while the comparison typeclass on the other hand allows for greater and less than comparison in addition to the equality functions. Additionally multiple levels of inheritance can be valid as we can see from the super_comparison
typeclass and also multiple inheritance as the io
typeclass shows.
Multiple Typeclass Instancess for a Specific Type¶
A typeclass can have different instantiations for a single type and they could be swapped even in runtime. As an example let us take a typeclass called ‘Ordering’ which denotes how the members of a type should be ordered. Then we have two instances of this typeclass, both implemented by a type, say a list. One implements an ascending order ordering and the other a descending order ordering. There should be a way to choose in runtime which of the two implementations the ordering would use.
So let’s look at the following example, which will not compile.
instance ordering ord_ascend<vector> {
fn(self)
{
...
}
}
instance ordering ord_descend<vector>{
fn(self)
{
...
}
}
Here we can see an additional feature. Instances of a typeclass must have an extra identifier if we implement more than one instance of a typeclass for a type. But why will this not compile? Well simply because 2 different instances are declared for a type without specifying one as the default implementation for all objects of type vector.
instance ordering ord_ascend<vector> default {
fn(self)
{
...
}
}
instance ordering ord_descend<vector> {
fn(self)
{
...
}
}
With the above code we can declare the ord_ascend
instance as default and as such all vector types unless otherwise specified will have this implementation for the ordering typeclass
And finally below we can see how to change the choice of typeclass instance in runtime.
a:list // ord_ascend
b:list<ordering: ord_ascend> // ord_ascend
c:list<ordering: ord_descend> //ord_descend