- Published on
Expressions
Overview
This Blog deals with Expression in CPP in details. In C++ a conventional expression is much more than an arthematic expression as a matter of fact in C++ every statement that is not terminated by a semi-colan can be considered as an expression. In that sense an assignment is an expression, a function call is an expression, an arthematic evaliation is an expression, and even a definition is an expression. To give an expression about how to use expression in CPP, this chapter presents you with an example of a simple desk calculator, later down the line we also list a complete set of operators and their meaning for the built-in as well as the user-defined types. The operators that requires more eleborate description are discussed in some other section of this serires.
- A Desk Calculator
- Parser
- Input
- Better Approach
- Command Line Arguments
- Operators
- Results
- Order of Evaluation
- Temporory Objects
- Constant Expression
- Symbolic constants
- Const in Constexpr
- Literal Types
- Reference Arguments
- Address of Constant Expression
- Implicit Type Conversion
- Pointer and Reference Conversion
- Boolean Conversion
- Unusual Arthematic Conversion
A Desk Calculator
A Desk Calculator on a breif outlook of it can be consdiered as a simple compiler that provides us with at least 4 operations(addition, multiplication, division and substraction) on the operands of floating point or any arthematic types, that can be implicitly converted to the floating point or integral type. As such in our example of the calculator you can define a variable with the consequent identifier such as the following:
r = 12
area = pi * r * r
For instance the above example with produce the output as follows:
12
452.6
Where, 12 corresponds to the value of the expression evaluated from the using r = 12
, meaing the value of identifier r corresponds to 12, and 452.6 corresponds to the value of the expression area = pi * r * r
, where pi is the predefined constant that we define in our symbol table and the result of the expression is as shown there, which corresponds to the value of the identifier area.
Our calulator in fact, can be thought of as a miniature compiler that consist of the following parts:
A parser
: Responsible for syntactical analysis to ensure that the there is no invalid token in the sequence specified by the given expression.An Input function
: Responsible for handling input and lexical analysis of the given expression i.e. if the expression is valid logically.A Symbol Table
: Responsible for holding permanent data such the value of each of the defined identifier or evaluated expression.A Driver function
: Responsible for initializations, output, error handling, etc.
With that said, lets start with the parser for language specified by the grammar of our compiler.
Parser
Here is the grammar supported by our calculator:
program:
end //end of the input
express_list //list of evaluated expression
expr_list:
expression print //pint is a newline or semicolan, and as the name suggest print the previously evaluaed epxression
expression print expression_list
expression:
expression + term
expression - term
term
term:
term / primary
term * primary
term
primary:
number //number is a floating point literal
name //name is an identifier.
name = expression //name can be assigned to an expression, which in turn calculate the result for + and - operations
-primary //primary value can be negated.
(expression) //expressions such as 2 + 3 + 4 are evaluated as 2 + (3 + 4) in our case.
In other words, a program is a sequence of expressions terminated by a smeicolon or to put blunt a sequneces of statements. The basic unit of an expression are numbers, name, and operators(both uninary and binary) such as *, +, /, %, ~, &, ^, |
, etc. Where names need to be declared before intended use.
I use the style of programming called recursive descent
where a function is called is the order of hierarcy to do some work, meaning a higher order function calls a lower order function to do most of it its work. For instance as one can see above, we have defined our parser function called expression, term, and primary, where:
- Expression is resposible for evaluating the result of addition and subtraction, while it relies on term function to evluate the result of multiplication and division
- The term function on the other hand relies on primary function to evaluate the primary in expression which could be number, evaluated value of a name, negation of a primary, or even an expression itself.
- In a programming language like C++, where function calls are relatively cheap, this style of top down programming is usually popular, so I exploited the same.
- Hence for each production rules specified in the grammar above, we call another function, the terminal symbols such as
end, number, +, -
are recognized by the lexical analyzer and non-terminal symbol are recognized by the syntax analyzer function such asexpress(), prim() and term()
. - As soon as both the operand for a subexpression is known the epxression is evaluated. In real compiler code can be generated at this point of time.
- For the input parser we have used a Token_Stream, a token stream tokenize the sequnce of characterrs in the source file, into {kind-of, value} token pair where kind-of indicates the type of the token i.e. either number, operator, end of the statement, parthesis, etc. And the value here indicates the corresponding character literal value of each token(see Kind enumerator as defined below, to indicates the kind of token supported by our example)
- Though we have explicitly defined the Token_Stream the main part of the parser, need to know only about the name of the Token_Stream and how to read tokens from it. To read next token we have defined a function called
get()
on the token stream, while to read the current token we have defined a function calledcurrent()
on the stream. - In addition to providing tokenization, a Token stream should also easily incapsulate the source of character, meaning we can get the character either from the user typing through the input stream, or through command line interface, while providing interface to each of the operation as opposed to bogging the user with implementation details.
- The Kind of the token used looks something like this:
enum class Kind: char{
name, number, end,
plus = '+', minus = '-', mul = '*', div = '/',
print = ';', lp = '(', rp = ')'
}
struct Token{
Kind kind;
double numberval;
string stringval;
}
- As discussed above, storing both the number and the character repesentation of a token can be usefull in some debugging purposes, however, this can only work as long as you remember no character used as an input has a value same as the enumerator, and no printing character at least in my memory has a single digit printing value in most of characterset, especially in ASCII characterset implementation. For instance: the following lets consider the following piece of code
#include<iostream>
#include<limits>
#include<math.h>
using namespace std;
int main(){
bool is_signed = std::numeric_limits<char>::is_signed;
if(is_signed){
for(int i = -1 * pow(2, 7); i < pow(2, 7)-1; i++){cout << '{' << i << "," << (char)i << '}' << " ";}
cout << endl;
}else{
for(int i = 0; i < pow(2, 8)-1; i++){cout << '{' << i << "," << (char)i << '}' << endl; }
cout << endl;
}
return 0;
}
- On an implementation with 7-bits ASCII characterset, the following segment of code with generate the output as follows:
{-128,�} {-127,�} {-126,�} {-125,�} {-124,�} {-123,�} {-122,�} {-121,�} {-120,�} {-119,�} {-118,�} {-117,�} {-116,�} {-115,�} {-114,�} {-113,�} {-112,�} {-111,�} {-110,�} {-109,�} {-108,�} {-107,�} {-106,�} {-105,�} {-104,�} {-103,�} {-102,�} {-101,�} {-100,�} {-99,�} {-98,�} {-97,�} {-96,�} {-95,�} {-94,�} {-93,�} {-92,�} {-91,�} {-90,�} {-89,�} {-88,�} {-87,�} {-86,�} {-85,�} {-84,�} {-83,�} {-82,�} {-81,�} {-80,�} {-79,�} {-78,�} {-77,�} {-76,�} {-75,�} {-74,�} {-73,�} {-72,�} {-71,�} {-70,�} {-69,�} {-68,�} {-67,�} {-66,�} {-65,�} {-64,�} {-63,�} {-62,�} {-61,�} {-60,�} {-59,�} {-58,�} {-57,�} {-56,�} {-55,�} {-54,�} {-53,�} {-52,�} {-51,�} {-50,�} {-49,�} {-48,�} {-47,�} {-46,�} {-45,�} {-44,�} {-43,�} {-42,�} {-41,�} {-40,�} {-39,�} {-38,�} {-37,�} {-36,�} {-35,�} {-34,�} {-33,�} {-32,�} {-31,�} {-30,�} {-29,�} {-28,�} {-27,�} {-26,�} {-25,�} {-24,�} {-23,�} {-22,�} {-21,�} {-20,�} {-19,�} {-18,�} {-17,�} {-16,�} {-15,�} {-14,�} {-13,�} {-12,�} {-11,�} {-10,�} {-9,�} {-8,�} {-7,�} {-6,�} {-5,�} {-4,�} {-3,�} {-2,�} {-1,�} {0,} {1,} {2,} {3,} {4,} {5,} {6,} {7,} {8} {9, } {10,
} {11,
} {12,
} {14,} {15,} {16,} {17,} {18,} {19,} {20,} {21,} {22,} {23,} {24,▒} {25,} {26,▒} {27, {28,} {29,} {30,} {31,} {32, } {33,!} {34,"} {35,#} {36,$} {37,%} {38,&} {39,'} {40,(} {41,)} {42,*} {43,+} {44,,} {45,-} {46,.} {47,/} {48,0} {49,1} {50,2} {51,3} {52,4} {53,5} {54,6} {55,7} {56,8} {57,9} {58,:} {59,;} {60,<} {61,=} {62,>} {63,?} {64,@} {65,A} {66,B} {67,C} {68,D} {69,E} {70,F} {71,G} {72,H} {73,I} {74,J} {75,K} {76,L} {77,M} {78,N} {79,O} {80,P} {81,Q} {82,R} {83,S} {84,T} {85,U} {86,V} {87,W} {88,X} {89,Y} {90,Z} {91,[} {92,\} {93,]} {94,^} {95,_} {96,`} {97,a} {98,b} {99,c} {100,d} {101,e} {102,f} {103,g} {104,h} {105,i} {106,j} {107,k} {108,l} {109,m} {110,n} {111,o} {112,p} {113,q} {114,r} {115,s} {116,t} {117,u} {118,v} {119,w} {120,x} {121,y} {122,z} {123,{} {124,|} {125,}} {126,~}
- This show that on my implementation as on many other, the character are implemented as 8 bits signed character, and hence have the value betweem the range(-128 to 127) and no printing character in my characterset have a single digit ASCII integral equivalent. Consequently, we can safely use the above notion.
- Lets get to wrinting the parser function now, each of which accepts a bool argument and returns evluates its expression while returning the corresponding value.
#include<iostream>
using namespace std;
double expr(bool get){
double left = term(get);
for(;;){
switch(ts.current().kind){
case Kind::plus: {
left += term(true);
break;
}
case Kind::minus: {
left -= term(true);
break;
}
default: {
return left;
}
}
}
}
int main(){
return 0;
}
- As clearly evident from the above implementation of the parser function exper, one can easily deduce that the given function check againts the integral constant value to determine the kind of the current token, and if the token happens to
+
or-
, it keeps evaluating the expression otherwise, it returns evaluated expression thus far. - The notion get is used to get the next token in the sequence, while current specified the current token.
- The notion of continously executing the switch statement until a token other than a
+
or-
is encounter is specified by providing a default statement which returns the evaluated value thus far if the token is not+
or-
anymore. - The curious notion of
for(;;)
is used to specify the infinite loop as it have not loop intializer, counter or terminator expression in it. Equivalent a statement such aswhile(true)
can be used for the said notion. - The operator
+=, -=
, are used to handle addition and subtraction, and are what we called as the compound operators, and generally evaluates toa = a + b
, ora = a -b
, when applied on operanda and b
. - This notion are not only shorter but also specify the intent of reassigned with incremented value explicitly.
- However, each of the compound assignment corresponds to its own separate lexical token, consequently we can't have space between them, meaning
a + = b
, whould have been an error because+ and =
, when used with trailing space corresponds to different lexical tokens. - C++ provides the following compound operators for the binary operand a and b:
+=, -=, *=, >>=, <<=, %=, /=, *=, &=, |=, ^=, etc
- However, for any operand a and b, a compound assignment operator
@
always evaluates asa@b=> a = a@b
, except for the fact that a is evaluated only once here.
Now we can start defing the next parser function in the hierchary, i.e. the term function, the term function handle multiplication and division in the same way that the expr function does, except for the fact that since we're dealing with division amongst floating point numbers, we can run into the Zero-Division Exception and need to take into the consideration the same.
int err_count = 0;
int error(const string& message){
error_count += 1;
cerr << message << endl;
return err_count;
}
double term(bool get){
double left = prim(get);
for(;;){
switch(ts.current().kind){
case Kind::mul: {
left *= prim(true);
break;
}
case Kind::div: {
if(auto d = prim(true)){
left /= d;
break;
}
return error("Zero Division Error");
}
default:
return left;
}
}
}
- As evident with the above example, we handle the case of multiplication and division with the current token much in the similar fashion as above, applying the corresponding operator on the current expression results as long as the current token happens to be
*
or/
and returns the result if something other than that is encoureterd in the current token we return the evaulated expression. And as usuall we are following the same recursive decent strategy of leaving the task of evaluation to another function prim now, except for the fact that since we're getting lower into the function hierarchy, much more work is needed to be done now. - Moreover, since we taking into the consideration a zero division error we have used an auxilary function error which couts for any such errors, print the relavant function and returns. Additionally curious eyes may notice that we are declaring an identifier within the scope of conditional statements itself, such sort of identifier are introduced in the smallest possible which is usually better either way cause it not only reduce the chances of any side effect, but also prevent us from unintentionally modifying the value before its intended use. Moreover, the scope of such identifier is that of the statement itself, and the value of thus evaluated identifier is the value of the statement too.
- Consequently, the corresponding conditional statement is going to evaluated only if d returns a non-zero value which in trun ensure us that we can safely divide it with the result from prev expression.
- Lets get into writing the function for primary expression evalation now.
map<string, double> table;
double prim(bool get){
if(get){ts.get(); } //get the next token if available.
switch(ts.current().kind){
case Kind::number: {
double& v = ts.current().numberval;
ts.get();
return v;
}
case Kind::name: {
double& v = table[ts.current().string_val];
if(ts.get().kind == Kind::assign){
v = expr(true);
}
return v;
}
case Kind::minus: {
return -prim(true);
}
case Kind::lp: {
auto e = expr(true);
if(ts.current().kind != Kind::rp){return error("expected a ')'");}
ts.get(); //get the nex token value.
return e;
}
default:
return error("primary token Expected");
}
}
From the closer look of it, the parser method for evluated value of the primary expression is pretty simple, and can be describe as follows:
- When the token is a number, the value is placed in the number_val, similarly when the token is a name the value is placed in a string_val, Moreover, as one can note with a carefull inception of the above code, prim always read one more token than it uses to analyze the primary expression. The reason is that they must do that in some case, so for consistency it is done in all of the cases. In case the parser simply want to move ahead to next token it doesn't return value from
ts.get()
, that is fine cause we can still get the result fromts.current()
for the current token. - Before moving ahead in the evaluation of primary expression, we need to determine wheter the value is being assigned to a name, or simply read, in each of the cases, we construct a Symbol table called table, which is nothing but a map of the string representation of the name token and the corresponding value being evaluated for that token.
- Moreover, as one can easily notice here, we are using expr in prim, and though the code can be organized in such as way that each of the parser function is declared atmost once before the corresponding defintion, however, we still need to break out of the loop while using expr inside of the prim which in turn is used by term, this can be easily done by declaring the expr function before the use of prim.
Input
Usually speaking getting the input from the user in the desired format always reamins one of the hardest part of any program, cause one need to tend to the whimsical nature of the end user, seemingly random errors. However, enforcing the user to put in the input in accordance to the machine is often rightfully considered offensive. The task of low-level input however, usually involves converting the given sequences of input characters into higher level tokens, these token can then be cosidered as a part of the input for higher level routine. In this prelevant case, the low-level input is done with the function provided by the given token stream class i.e. ts.get()
. Before putting the input into prospective however, let's complete the Token_Stream User-defined to compose token out of the input character stream.
class Tokenizer{
private:
istream* is;
bool owns;
Token ct = {Kind::end};
void close(){if(owns){delete is;}}
public:
Tokenizer(istream* is1): is{is1}, owns{true}{};
Tokenizer(istream& is1): is{&is1}, owns{false}{};
void set_inputstream(istream* is1){
close();
is = is1;
owns = true;
}
void set_inputstream(istream& is1){
close();
is = &is1;
owns = false;
}
Token get();
Token& current(){return ct;}
~Tokenizer(){
if(owns){close();}
}
};
- In the above declarion of Token_Stream we have used a rather eleborate apporach of realeasing any resource that is to be acquired by the input stream, if for instance its allocated dynamically at the free store, for which case we employ the corresponding delete operation to the free any resource acquired by such input stream as the destructor of the class is being called. Consequently, by only deleting the input stream passed as a pointer, we can directly mangage the cases where the handler to the resource is a poiinter or the pointers owns a resource while still applying the RAII pricinapals, i.e. taking care of acquisition of any resources within the constructor but realse of the same within the desctructor.
- Consequently, the above definition of Token_Stream have three member variables, i.e. pointer to the input stream, a bool field indicating wheter the input stream owns any resource requiring release, and the current token filed ct.
- We specify the current token field to a default value of
Kind::end
, so as to ensure that the user will get a well-defined value even if they try to call ts.current(), before getting any token. However, since we can't really proceed with such token value either way we returns the end token indicating the end of the token value as opposed to anything else. - The method
current()
as defined on the Token_Stream helps us to get the current value of the token, while the methodget()
is used to get the next value of the thus composed token off the input stream.
With that said, we can finally have a look at the implementation of the get method dealing with the low-level input i.e. composing the higher level tokens out of the low-level sequences of the character.
Token Tokenizer::get(){
char ch = 0;
*is >> ch;
switch(ch){
case 0:
return ct = {Kind::end};
case '+': case '-': case '*': case '/': case ';': case '=': case '(': case ')':
return ct = {static_cast<Kind>(ch)};
case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': case '.': {
is->putback(ch);
*is >> ct.numberval;
ct.kind = Kind::value;
return ct;
}
default: {
if(isalpha(ch)){
is->putback(ch);
*is >> ct.stringval;
ct.kind = Kind::name;
return ct;
}
error("Bad Token");
return ct = {Kind::print};
}
}
}
While closing examing the above-mentioned code, we can come to a conclusion that we're trying to reading the value of the input stream to the character ch, and the right shift operation of input stream in such manner is usally applicable on any fundamental type except for the built-in pointer and references of course, and generally reads the given value from teh input stream while excluding the white spaces, to the given identifier. Thus in the given example, we're reading the value of istream to the character ch and if happens to be 0, it will of course, indicate the end of the character stream, for which case, we can safely return the end Token, however, if the token happens to be any of the operators we should return the corresponding Token to that operator. Moroever, since we have defined the kind of the token in such as way that the corresponding enumerator value will be equivalent to the token Type, when explicitly converted into the given Enum type, we can easily return current token with the given explicit conversion. Moreover, if not already apparent you should be able to spot a assignment done with the declration in a single line, we can of course, do so such declration if we want to keep the scope of the declration and assignment to minimal, however, since we're intializing a complete object of type Token, it also make sense to give the complete declration as opposed to working with individual object of given type and then intializing the individual members. Such direct initializations are often more explicit and easire to understand as opposed to working with partial declration with object of given type. With that said let's now move towards the real crux of the given function i.e. composing token out of the low level input routine or sequence of characters.
- To begin with we start with putting the value from the input stream in character ch, here the operator
<<
reads, the value of ch while ignoring any trailing whitespaces, and doesn't affect the corresponding value of the whatever is being read if the value can't be read or an error is encountered. - Consequently, the resultant value of
ch = 0
could be used to indicate the end of the input stream, as there is no change in change, which could only mean their is nothing else to be read from the input stream or any whitespace character is being encounter after reading the value, for which case the value of ch reamins unchanged i.e. 0 and hence we can safely return the end token. - Thereafter, we can subsequently, checks if the given token corresponds to any operator as defined as permissible token in our input stream for which case we can return the explicilty casted value of the corresponding token as specified within the enumerator.
- If the character however, turns out to be a number, we need to evaluate the value for the corresponding expression, which is subsequently facilitated by putting the value of such token from input field back to the numberval member of the Token, which corresponds to the numeric value of the current token and hence using the corresponding value to evaluate the given expression, while returning the current token to be that of the type of value so as facilitate the evaluation of expression in the parser function specified by the
prim()
function. - If however, the encountered token doesn't happen to be an operator, number, or end token, the only remaing choice happens to be an alphabet. We thus conviently check if the given character happens to be character while exploiting the isaplha function as opposed to explicitly specfying each of the alphabet character within the case labels.
- As such if the token happens to be a chracter we does the same thing as done with the numeric token, i.e. read the corresponding value of the token off the input stream to the stringval member of instance of the current token, and returning the given token, while specfying its kind to be Kind::name.
- However, if none of the proceeding happens to be case, it essentially mean that the user provided an unexcepted token such as a special character, etc,. for which case we should ampty throw an error, and continue with either reading the next token or terminating the program.
To put everything into the prospective the code thus far is as mentioned below:
#include<iostream>
#include<map>
using namespace std;
int error_count{};
map<string, double> table;
int error(const string& message){
cerr << message << endl;
error_count += 1;
return error_count;
}
enum class Kind{
name, value, end,
plus = '+', minus ='-', mul = '*', div = '/',
assign = '=', print = ';', lp = '(', rp = ')'
};
struct Token{
Kind kind;
double numberval;
string stringval;
};
class Tokenizer{
private:
istream* is;
bool owns;
Token ct = {Kind::end};
void close(){if(owns){delete is;}}
public:
Tokenizer(istream* is1): is{is1}, owns{true}{};
Tokenizer(istream& is1): is{&is1}, owns{false}{};
void set_inputstream(istream* is1){
close();
is = is1;
owns = true;
}
void set_inputstream(istream& is1){
close();
is = &is1;
owns = false;
}
Token get();
Token& current(){return ct;}
~Tokenizer(){
if(owns){close();}
}
};
Token Tokenizer::get(){
char ch = 0;
*is >> ch;
switch(ch){
case 0:
return ct = {Kind::end};
case '+': case '-': case '*': case '/': case ';': case '=': case '(': case ')':
return ct = {static_cast<Kind>(ch)};
case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': case '.': {
is->putback(ch);
*is >> ct.numberval;
ct.kind = Kind::value;
return ct;
}
default: {
if(isalpha(ch)){
is->putback(ch);
*is >> ct.stringval;
ct.kind = Kind::name;
return ct;
}
error("Bad Token");
return ct = {Kind::print};
}
}
}
Tokenizer ts{cin};
double expr(bool get);
double prim(bool get){
if(get){
ts.get();
}
switch(ts.current().kind){
case Kind::value: {
double v = ts.current().numberval;
ts.get();
return v;
}
case Kind::name: {
double& v = table[ts.current().stringval];
if(ts.current().kind == Kind::assign){
v = expr(true);
}
return v;
}
case Kind::minus: return -prim(true);
case Kind::lp: {
auto e = expr(true);
if(ts.current().kind != Kind::rp){
return error("Expected a ')'");
}
ts.get();
return e;
}
default:
return error("Expected a primary");
}
}
double term(bool get){
double left = prim(get);
for(;;){
switch(ts.current().kind){
case Kind::mul: {
left *= prim(true);
break;
}
case Kind::div: {
if(auto d = prim(true)){
left /= d;
break;
}
return error("Division by zero");
}
default:
return left;
}
}
}
double expr(bool get){
double left = term(get);
for(;;){
switch(ts.current().kind){
case Kind::plus: {
left += term(true);
break;
}
case Kind::minus: {
left -= term(true);
break;
}
default:
return left;
}
}
}
void calculate(){
for(;;){
ts.get();
if(ts.current().kind == Kind::end){break;}
if(ts.current().kind == Kind::print){continue;}
cout << expr(false) << '\n';
}
}
int main(){
calculate();
return 0;
}
- There is however, one cirtical problem with the given code, since we're essnetially reading the given sequence of the character and composing the tokens out of them while considering the first whitespace character in the sequence and determing the corresponding type of the token, we're essentially enforcing the user, to distinguish between a token and the corresponding intializer with space, meaning while the expression such as
a=12;
, will corresponding to a single token in the given example, expression such asa = 12
, will corresponds to a token of type name intialized with a value of 12. This is however, far from ideal, consequently, we need to move away from this discriptively simpler version relying on the user to provide the correct format to a more generic way of reading low level input or providing low-level input routine for composing tokens out of our character sequences.
Better Approach
To overcome the drawback as mentioned above, we need to move from type oriented default input operation in get(), with the one that read individual characters. Let's start by modifying the section that checks for the next chracter in the sequence being a end of the Token stream or a whitespace then. We can easily modify it to work on individual character level as follows:
Token Token_Stream::get(){
//...
char ch;
do{
if(!is->get(ch)) return ct = {kind::end};
}while(ch != '\n' && isspace(ch));
switch(ch){
//check for new lines and end of the Token Stream.
case ';': case '\n': return ct = {Kind::end};
//...
}
}
- Here we use the do while statement, which is similar to the while statement i.e. it execute until the correponding predicate for the while statement is true, but with a minor difference the controll flow for a do while statement is executed at least once.
- The call
is->get(ch)
, reads a single character from the input stream is into the character ch, and returns true if it could read the character otherwise, it returns false. However, unlike<<
,get()
doesn't skip the whitespace, consequently, we can use the corresponding predicate to easily check for the end of the Token Stream. - The standard library
isspace(ch)
, provides a standard test for whitespace, and returns non-zero value if ch is whitespace otherwise, it returns 0. This test like most of the function in the family of string literals likeisalpha, isalphanums, isdigits
, etc., is implemented as a table-lookup and hence is usually significantlly faster than checking for individual characters. Thus after, the whitespace can be skipped in this manner the next character can be used to determine what kind of the lexical token is coming. - The problem cause by using
<<
is thus solved by using reading one character at a time untill a character that is not a letter or digit is found.
Token Token_Stream::get(){
//...
default: {
if(isalpha(ch)){
stringval = ch;
while(is->get(ch) && isalnum(ch)){
stringval += ch;
}
is->putback(ch);
return ct = {Kind::name};
}
}
}
- Fortunately both this modification can be applied by changing local section of code. Constructing programs so that improvement can be applied through local modifications is one of the important design aim.
- As such you might worry that adding the character literals individually to the end of the string might be ineffecient, and it's for larger string, however, C++ string literals are usually implemented with small string optimization, as such since we would not need something craizer than the default implementation of string provided by C++ for strings as smaller as an identifier name in our desk calculator, this won't necessarily cause any inefficiency or significant delay in execution of the program.
With all of these, the final modification to our desk calculator program can be as described as listed below:
#include<iostream>
#include<map>
using namespace std;
int error_count{};
map<string, double> table;
int error(const char* message){
cerr << message << endl;
error_count += 1;
return error_count;
}
enum class Kind: char{
name, value, end,
plus = '+', minus ='-', mul = '*', div = '/',
lp = '(', rp = ')', assign = '=', print = ';'
};
struct Token{
Kind kind;
double numberval;
string stringval;
};
class Token_Stream{
private:
istream* is;
bool owns;
Token ct = {Kind::end};
void close(){
if(owns){delete is;}
}
public:
Token_Stream(istream* is1): is{is1}, owns{true}{};
Token_Stream(istream& is1): is{&is1}, owns{false}{};
void set_inputStream(istream* is1){close(); is = is1; owns = true;}
void set_inputStream(istream& is1){close(); is = &is1; owns = false;}
Token get(); //get next token
Token& current(){return ct;} //get current Token.
};
Token Token_Stream::get(){
char ch;
do{
if(!is->get(ch)){
return ct = {Kind::end};
}
}while(ch != '\n' && isspace(ch));
switch(ch){
case ';': case '\n': return ct = {Kind::print};
case '+': case '-': case '*': case '/': case '(': case ')': case '=': {
return ct = {static_cast<Kind>(ch)};
}
case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': case '.': {
is->putback(ch);
*is >> ct.numberval;
ct.kind = Kind::value;
return ct;
}
default: {
if(isalpha(ch)){
ct.stringval = ch;
while(is->get(ch) && isalnum(ch)){
ct.stringval += ch;
}
is->putback(ch);
return ct = {Kind::name};
}
}
}
}
Token_Stream ts{cin};
double expr(bool get);
double prim(bool get){
if(get){ts.get();}
switch(ts.current().kind){
case Kind::value: {
double v = ts.current().numberval;
ts.get();
return v;
}
case Kind::name: {
double& v = table[ts.current().stringval];
if(ts.current().kind == Kind::assign){
v = expr(true);
}
return v;
}
case Kind::minus: return -prim(true);
case Kind::lp: {
double e = expr(true);
if(ts.current().kind != Kind::rp){
return error("Expected a ')'");
}
return e;
}
default:
return error("Expected a primary token");
}
}
double term(bool get){
double left = prim(get);
for(;;){
switch(ts.current().kind){
case Kind::mul: {
left *= prim(true);
break;
}
case Kind::div: {
if(auto d = prim(true)){
left /= d;
break;
}
return error("Division by zero");
}
default:
return left;
}
}
}
double expr(bool get){
double left = term(get);
for(;;){
switch(ts.current().kind){
case Kind::minus: {
left -= term(true);
break;
}
case Kind::plus: {
left += term(true);
break;
}
default:
return left;
}
}
}
void calculate(){
for(;;){
ts.get();
if(ts.current().kind == Kind::end){break;}
if(ts.current().kind == Kind::print){continue;}
cout << expr(false) << '\n';
}
}
int main(){
table["pi"] = 3.14;
calculate();
return 0;
}
- As indicated in the above example the calculate function corresponds to the driver function running the program forever, i.e. until it encountered an unexcepted token. The primary task of the calculate function is therefore to print the evaluated expression to the Output stream, here the argument false within the expr parser function tells it that it doesn't need to get the next token to evaluate the expression. While testing for Kind::end ensure that the loop exit properly for any unexpected token or for any input error. While testing for knind::print relieve expr() for handling any empty expressions.
Command Line Arguments
- As it stands its quite painfull to come in any keep on typing the corresponding expressions to be evaluated with our current version of desk calculator. For most case, one might want to evaluate single expressions with such example and be done with that. This can easily be done by specifying the input to the program as command line arguments. As in C, C++ convention for passing Command line arguments involves passing two arguments namely argc conventionally reffering to the count of the arguments being passed, and the C-style array of the argument vector or
char* argv[]
being passed to the main function. - While working with the command line arguments in CPP however, one should keep into the consideration that the first element of the corresponding argument vector often reffer to the name of the program. While the rest of the arguments corresponds to the arguments being specified by the user.
- As such one can usually takes the corresponding command line arguments one at a time, or can pass them as a single expression of string literals seperated by
;
in unix-like operating System. For instance for a program with the name ofabc
when passed with the argumentsarea=12;area+12;area
, would of course have 4 elements corresponding to each of the arguments.
The command separator for specifying the command line argument may differ across different Operating System, however, ;
is often treated as one of the most common command seprator for linux or unix-like system system, and hence I present the most generic notion for passing such arguments as oppose to fidling with all such notions.
- Thus equipped with the following knowledge we can use command line arguments to pass in the expressions to be evaluated as string arugments as opposed to mannually inputting them every time we run the program. Now since, we need to pass in the string value to the inputstream, we have to allow reading of such stream in the same way as we would with the normal input stream, this sort of notion is supported by stringstreams in C++, and surprisngly, the steam of string used as inputstream in CPP are often known as istringstream. The corresponding implementation for istringstream is provided by the sstream implementation header. We can thus, pass the corresponding expression to be evaluated as command line arguments as following:
Token_Strem ts{cin};
int main(int argc, char* argv[]){
switch(argc){
case 1: break;
case 2: ts.set_inputStream(new istringstream{argv[1]}); break;
default: return errror("Too many arguments");
}
}
As simple as they are argv, and argc notion for passing in the command line arguments are still noticable source of minor yet annoying bugs. To avoid these bugs and specially to make it easire to pass along program arguments, I tend to use simple function to create vector<string>
as follows.
vector<string>res;
vector<string>create_vector_arguments(int argc, char* argv[]){
for(int i = 0; i < argc; i++){
res.push_back(argv[i]);
}
return res;
}
Operators
In C++ operators are generally used for some operations on the object of the given types. These operations can be anything ranging from arthematic operations, subscripting or getting access to element in a sequence, dereferencing, bitwise operations, etc. As such C++ offers a host of operators that can be conviently used on most of the built-in types, and can be defined for the user-defined types. Each of these operators generally have some precedence which determines the order in which the corresponding operators result is evaluated in an expression. Some of the operators with their relavant precedence however, rather than listing whole of the table for precedence, and prompting to you learn that I think it would be much better if you just look for it from somewhere and keep it handy as and when required. Moreover, with sufficient practise, most of the common cases of the precedence can be accounted for without even having to learn the precedence table either way. For now though just remember the following: parenthized expression > lambda > scope resoultion > namespace scope resoultion > global scope resolution > Member Selection > Pointer dereferencing Member selection > Subscripting > Function call > post inc. > post dec. > pre inc. > pre decrement > bitwise operators > memory allocator operators > memory deallocator operators > Multiply > divide > Modulo > Add > Subtract > Shift left > Shift right > less than > less than eq to > greater than > greater than equal to > Logical and > Logical or > conditional expression > throw exception > compound assignment > Sequencing operator
For instance the expression a + b * c, would essentially means, a + (b * c) as opposed to (a + b) * c as \*
has more precedence than '+'. However, instead of fidling with such minor details while evaluating an expression I often find it relatively easy to specify the exact expression or to explicitly specify the order of the evaluation of an expression by enforcing group through the explicit use of parenthesis. However with that said, remembering some rules like assignment and uninary operators being the only one that follows the right associativity in evaluation doesn't really hurt that much and help to weed out most of the corner cases cocerning precedence of operators while evaluating an expression. Consequently, for an expression likea=b=c
, it will eventually be evaluated as a=(b=c)
as opposed to(a=b)=c
.
- Before applying any grammar rules however, lexical tokens are composed from the character in such a manner that the longest plausible sequence of character is chosen to compose the corresponding sequence of the token. For instance
&&
would essentially mean a single token corresponding to an rval reference as opposed to two address of operators. Similarly,a+++1
would eventually be evaluated asa++ + 1
, while considering the longest plausible sequnece of character to compose a token which here corresponds to a post increment operator followed by an addition operator. - This rule is sometimes often known as the Max Munching rule.
- Much in the same spite while composing token out of the sequneces of the character, whitespace can be used to distinguish between the token, for instance in the declration
int i = 2
, separate tokens would be constructed for the keywordint
and identifieri
instead of a single token for an indentifierinti
. - Some character such as | and most of the logical or bitwise operators are usually not so amicable for the most of the programmers to type, for which case C++ provides alternate to each of such operators. For instance let's consider the following example:
#include<iostream>
using namespace std;
void use_bitwise_operator(int a, int b){
cout << (a & b) << endl;
cout << (a | b) << endl;
cout << (a ^ b) << endl;
cout << (a &= b) << endl; //clear the old state.
cout << (a |= b) << endl; //add to the current value.
cout << (a ^= b) << endl; //check whether there is any change in the old state.
cout << (a and_eq b) << endl;
cout << (a or_eq b) << endl;
cout << (a xor_eq b) << endl;
}
int main(){
use_bitwise_operator();
return 0;
}
Results
The result of the type of the arthematic operations is determined by the set of the rules often know as the usual arthematic conversion. The effect is that the type of the expression is determined by the largest operand type, or the type capable of holding the consequent result. For example when using floating point type in an expression that otherwise use the integral type, the resulting type involves conversion to the floating point number and hence the consequent type of conversion is also a floating point number. Similarly if the long is used the consequent operations will be done with long integral arthematics. The relational operator <=, >=, ==, etc.,
produce Boolean result. The meaning and type of user-defined operators is however, determined by the defintion of such operator on the given type. Whenever logically fesible the result of the operation that involves lval is an lval. This is particulary usefull when writing code that need to work uniformly across lvals for both built-in and user-defined types. For Example:
#include<iostream>
using namespace std;
class userDefinedTypes{
private:
int x, y;
public:
friend ostream& operator<<(ostream& os, const userDefinedTypes& u1){
os << u1.x << " " << u1.y;
return os;
}
bool operator==(userDefinedTypes& d1){
return (d1.x == x && d1.y == y);
}
userDefinedTypes(int x1 = {}, int y1 = {}): x{x1}, y{y1}{};
};
void lvaloperand(int& x, int& y){
int j = x = y; //right associativity j = (x = y);
#ifdef Error
int& r = (x < y) ? y : 1; //error 1 is not an lval. Cant bind an lval to an rval.
#endif
#ifdef Errors
int* ptr = &((x < y) ? y : 1); //error 1 is not an lval and hence can't be bound to a reference through pointer.
#endif
int* ptr = &((x < y) ? y : x);
cout << *ptr << endl;
}
int main(){
userDefinedTypes u1{12, 13};
cout << u1 << endl;
userDefinedTypes u2{12, 13};
cout << (u1 == u2) << endl;
userDefinedTypes u3{13, 15};
cout << (u2 == u3) << endl;
int x = 12, y = 13;
lvaloperand(x, y);
return 0;
}
In typical implementation we can check the sizeof the operand type with the sizeof operator which returns the unsinged integral value capable enough of holding the sizeof any operand type, more so, ptrdiff_t
as defined in the <cstddef> implementation header returns a singed integer capable enough of holding the result of subtracting two pointer in a sequence, thus for a well-defined sequence in the specified bound, the ptrdiff_t
can hold the no. of the elements in the given sequences.
#include<iostream>
using namespace std;
void f(){
float startTime = clock();
int i = 0;
while(i >= 0){i += 1;}
cout << "Integer overflowed at i = " << i << endl;
float endTime = clock();
cout << (endTime - startTime / 100) << endl;
}
int main(){
f();
}
The above-mentioned code will eventually lead to integral overflow, and hence the consequent value being printed corresponds to the maximum value as supported by integral values in the given implementation locale. However, it is not always garunteed that the compiler will be optimized enough to consistently check for such undefined behaviors. Similarly, the effect of division by zero with infix floating point operations, and underflow or overflow amongst integral type is also undefined. In particular, they are not always garunteed to throw std::exception.
Order of Evaluation
The order of evaluation of a subexpression in an expression is undefined in CPP. In particular we can't assume the expression to be evaulated from left to right. For instance lets consider the following example:
#include<iostream>
#include<vector>
using namespace std;
int f(float r){return r;}
float g(int r){return (float)r;}
int main(){
int k = f(12) + g(12.3f); //undefined wheter the call to g is evaluated first or call to f is evaluated first.
cout << k << endl;
int i = 1;
vector<int>v1(12, 0);
v1[i++] = i;
//undefned wheter i is going to be increment first, then used as the index of the given container type, or otherway around i.e. whether v1[1] = 2 or v1[2] = 1.
for(size_t i = 0; i < v1.size(); i++){cout << v1[i] << " "; }
cout << endl;
return 0;
}
As such better code can be often generated in the absence of the order of evaluation of an expression. However, absence of restriction on the evaluation order can also lead to undefined result. For instance in the assignment v[i++] = i
, it is undefined wheter, the consequent assignment results in v[1] = 2 or v[2] = 1
Thus the expression that reads or write the value more than once, should often be avoided, unless it does so using a single operator such as ++, +=, --, -=, sequencing operator, logical operators, etc
which make it well-defined.
Usually an expression or a function call that takes an order-independent arguments lead to bad taste of code, or ugly code mostly pertaining to the fact, the behavior of such expression is impelementation-defined and hence, they are avoidable source of errors and should be avoided to the greatest extent. Sometimes, however, one can confuse such order-dependent subexpression with sequencing operator which is a perfectable viable operator in C++, that allow for evaluation of an expression in the specified sequence i.e. from left to right of the sequence operator, consequently, while using a sequencing operator, the left hand operand is usually evaluated first as opposed to the Right Hand operand. For instance, lets consider the following example:
#include<iostream>
using namespace std;
int main(){
int x;
int y = x = 2, x + 1;
cout << x << " " << y << endl;
return 0;
}
The following expression will result in evaluation of x to 2 and so the consequent assignment of y to 3, while increment x by 1. Again such expression can be a little confusion and hence are noticable source error in C or old styled C++ code.
Temporory Objects
While evaluatiing an expression, a compiler might need to store an intermidiate result of the compuation for the subexpression in a temporory literal which is generally destroyed at the end of the full expression of which its a part. Consequently, a temporory literal is used to evaluate the entire expression, but its own lifetime is generally that of the full expression of which its a part i.e. untill its not bound to a reference or own a reosurce requiring release, for which case not realeasing the resource before the end of the full expression which bound the lifetime of the temporory. As though the creation and destruction of temporory is usually explicit for most of the expression, however, the case where such temporory owns some resource, or is bound a reference for which case its lifetime is that of the reference, we need to explicit take care of such object. Moreover, knowing wheter something is temporory can be used for significant optimization in how the objects are move or accessed.
Working with the temporory without knowing the inteded lifetime of such temporory in an expression could often lead to undefined result for intsance let's considere the given example:
#include<iostream>
#include<cstring>
using namespace std;
void printString(const string& s1, const string& s2){
const char* res = (s1 + s2).c_str();
cout << res << endl; //accessing pointer to the soon to disaapear temporory might have undefined results.
string s;
if(int res = strlen((s1 + s2).c_str()) > 0){
cout << res << endl;
}
}
int main(){
printString("random", "age");
return 0;
}
But on the first look, you might argue don't do that and I completely agree, in hindsight the first assignment to the string literal involves returning pointer to a temporory whose lifetime is that of the expression in which its being defined, meaning res might be pointing to an unallocated storage, consequently any sort of operations on such pointer could result in undefined behavior or reading or writing to an unrelated memory location consequently triggering hardware error or exception. The second declration of res is somewhat sensable since the scope any declration within a statement is that of the statement, and the value of the statement is the consequently value of such declration however, res might still not be accessible inside the corresponding conditional statements as temporory might get destroyed before the controlled statement is entred into the corresponding conditional block. As such most of the problems with the temporory like any other object of given type aroses from using high-level data in a relatively low-level way. A clearer programming style could lead to more comprehensible code, for instance the above example could have been equivalently been written as follows:
#include<iostream>
using namespace std;
void useString(const string& r1, const string& r2){
const string& res = r1 + r2;
cout << res << endl;
if(res.size() >=0){
cout << res.size() << endl;
}
}
int main(){
useString("random", "age");
return 0;
}
Now, since the lifetime of tempory is bound to a reference or an object, we don't need to worry about it being destroyed at the end of the expression where its being declared cause if bound if an object or a reference the lifetime of such tempory will be that of the object itself, meaning the object will destroyed when the given object its being bound to is destroyed. But we still need to remember that a temporory can't be bound to a non-const lval ref or reference bound to an object in the memory. As such a temporory can also be created by explicitly invoking constructor in specific the move constructor as defined on the given type.
#include<iostream>
using namespace std;
class Temporory_Object{
private:
int x, y;
public:
Temporory_Object(int x1 = {}, int y1 = {}): x{x1}, y{y1}{};
Temporory_Object(Temporory_Object&& obj1): x{std::move(obj1.x)}, y{std::move(obj1.y)}{};
Temporory_Object& operator=(Temporory_Object&& t1){
x = std::move(t1.x);
y = std::move(t1.y);
return *this;
}
void display_XY(){cout << x << " " << y << endl; }
};
int main(){
Temporory_Object obj1{12, 13};
Temporory_Object obj2{std::move(Temporory_Object{13, 13})};
obj1 = std::move(obj2);
obj1.display_XY();
obj2.display_XY();
return 0;
}
Constant Expression
When dealing with constant in CPP we're generally concerned with two construct as specified by the language's implementation i.e. const which specify immutability in the interface
and constexpr which specify runtime evaluation of something in the given scope.
Consequently, a constexpr is used for compile-time evaluation of constant of given types and can't be assigned to anything that would otherwise, require runtime evaluation. Constexpr are thus quite povital for language construct that require us to use constanat value or symbolic constant such as specifying the array bound, or using some constant literal values as the symbolic constants. For instance, lets consider the given example:
#include<iostream>
using namespace std;
void constexpr_functions(){
constexpr int i = 12; //OK integer literal can be evaluated at compile time.
int r = 13;
//constexpr int k = r; //error cant assign a constexpr with non-constexpr literal that require runtime evaluatio //error cant assign a constexpr with non-constexpr literal that require runtime evaluationn
//constexpr string ss = "Random"; //error cant assign a string literal that has an advanced user-defined semantics that can't be evaluated at compile time to a constexpr.
constexpr int arr[i]{1, 2}; //OK doesnt require any runtime evaluation
cout << *arr << endl;
}
int main(){
constexpr_functions();
return 0;
}
In the given example intialization of a constexpr i to the literal value r requires r to be known at the runtime, meaning that it can't be used as a constexpr, much in the similar sense since, a string ltieral itself has advanced user-defined semantics such as the copy or the move constructor, we can't really use it as well as constexpr because it can't be evaluated at compile time. Usually the notion of constant values or constexpr are favored in lieu of using dynamic variables due to the following reason:
- Named constant in the code are generally easire to maintain and understand.
- A variable might change within its entire lifetime, hence more house-keeping operation are needed to maintain the correponding variables throughout its lifetime as opposed to dealing with the constant, which usually specify immutability in the interface.
- Embedded System programmers like to put data in a well-defined constant literals or read-only memory cause read-only memory is relatively cheaper than the dynamic memory at least in terms of cost and energy consumption as opposed to the dynamic memory.
- Using constant literals as opposed to magic number to represent symbolic constants in a program can make the program much more maintainable and easire to reason about.
- If intialization is done at the compile time there can be no data races in a multi-threaded system.
- Sometime evaluating something at the compile time gives much better performance than doing so at the run time.
Since constexpr allows for compile time evaluations thereby enabling us to squeeze some performance at the runtime, the expressinve power of constantexpr is great and it should not be be little with the mere use of macros used to define own set of programming language. As usuall we can use integral, floating point and enumerator values as constexpr and we can even use any operator with the constexpr that doesn't modify the state meaning we can use operators such as bitwise, logical relational or conditional operators, but not compound assignment operators or operators such as post increment or decrement that modifies the state of an identifier with constexpr.
constexpr int sqrt_helper(int sq, int d, int a){
return(sq <= a) ? a : sqrt(sq+d, d+2, a);
}
constexpr int sqrt(int x){
return sqrt_helper(1, 3, x) / 2 - 1;
}
void find_values(){
cout << sqrt(9) << endl;
cout << sqrt(12) << endl;
cout << sqrt(16) << endl;
}
The conditional operator ?
when used with constexpr allows for conditional evaluation, in the sense that it allows us to evluate the conditional expression as constexpr only for the evluated expression but not for the expression not evaluated as the conditional expression. For instance we can write the constexpr function conditionally returning a constexpr. Similarly, the operand for the logical expression that is not evaluated may not be used as a constexpr.
#include<iostream>
using namespace std;
int global = 12;
constexpr int Error_code(int i){
return (i > 0) ? i : global;
}
#ifdef Errors
constexpr int randomFunction(int i){
constexpr int rp = 12;
return rp && global;
}
#endif
int main(){
constexpr int r = Error_code(12);
#ifdef Errors
constexpr int r1 = Error_code(-12); //cant use non-constexpr as constexp.
#endif
cout << r << endl;
return 0;
}
Symbolic constants
One of the obvious reason for using constant in a program or code is to specify symbolic flags or named constant instead of using magaic number in the given program. As a matter of fact, programmer littered with magic numbers(lets say any given integral values such as the sizeof the array) are usually harder to comprehend or reason about and in general a maintainenece harzard for other programmer working on the same code base. Therefore, symbolic constants provides us the the way to use such literal constants without any sideffects in the code, and in general these constant usually represent something in the given piece of code for instance a constant such as 4 might have been used for representing number of bits, whereas the constant such as 8 might be used to indicate the exchange rate factor between two currencies, lets say Danish to Dollar or something. However, at the first look of it, it might not be that apparent consequently, using some named constant literal to represent such values render the resulting code much more easire to reason about such values as opposed to relying on obscure assumptions. For examples lets consider the following code
const int exchangeFactor = 60;
int change_currency(int amount){
return amount * 60;
}
int change_currency(int amount){
return amount * exchangeFactor;
}
As evident from the above code, the segment of the code that use the constant literal is much more self-documenting in relying the intent of function for conversion with a given exchange rate as opposed to the first declration of essentially the same function. Moreover, it is equally resilient to any changes in the sense that since constant literals can be reassigned in the given translation unit in which they are declared, we can essentially change the corresponding literal and would've have same effect reflected as making any local modification to the given function.
Const in Constexpr
Both constant and constexpr are pretty related to each other, in the sense that they are used to defined constant literal in the given translation unit, but a minor difference being that constant are often used to specify immmutability in the interface, and constexpr being often used to specify compile time evaluation. However, a constexpr differ from a constant in the sense that though a constant can be assigned with non-const literal of the specified base type as no harm can come from that, a constexpr can't be assigned with anything that is not a constexpr or that require runtime evaluation. For example lets consider the following segment of code:
#include<iostream>
using namespace std;
int main(){
const int r = 2;
int k = 12;
const int rr = 12;
const int somevals = k; //OK cant modify somevals either way.
constexpr int k1 = r; //OK rr can be evluated at compile time.
#ifdef errors
constexpr int k3 = somevals; //error cant evaluate somevals at compile time as k cant be evaluated at compile time.
#endif
#ifdef Errors
constexpr int k2 = k;//error can't evaluate k at compile time.
#endif
return 0;
}
In the above example since, k can't be evaluated at compile time we cant really use it or any other values that indirectly reference it as a constexpr. However, we can freely assign the corresponding value of k to a constant since no harm can come from that.
Literal Types
A type in C++ is said to be literal type if it has a constexpr constructor defined on it or its constructor can be evaluated at compile time, however, for a constructor to be evaluated at compile time, it should be relatively simple to not include any runtime semantic, i.e. it should not have any body and each of the corresponding member of the given user-defined type should be constexpr or should be able to be evaluated at compile time. A literal type in my opinion is a rather overexploited construct in CPP and the earlier implementation of CPP only allows the integral types to be used as contexpr. This was the source of discontent amongst most of the programmer who wants to seequeze out most of the runtime performance out of the code, and was possibly the reason that a fair chunk of compile time evaluation in earlier days of CPP was accomplised by encoding the information in integral values, which in turn often lead to imporvised style of programming as opposed to relying on the runtime semantics. For example lets consdier the following:
#include<iostream>
#include<math.h>
using namespace std;
template<class T>
class Complex{
private:
T real;
T imag;
public:
constexpr Complex(int real1, int imag1): real{real1}, imag{imag1}{};
constexpr T getReal()const{return real;}
constexpr T getImag()const{return imag;}
constexpr float Magnitude()const{
return sqrt(real * real + imag * imag);
}
Complex& operator++(int steps){
if(steps == 0){steps = 1;}
real += steps;
imag += steps;
return *this;
}
friend ostream& operator<<(ostream& os, const Complex& c1){
os << '{' << c1.real << ',' << c1.imag << '}';
return os;
}
};
int main(){
constexpr Complex<int>c1{12, 13};
constexpr int realval = c1.getReal();
constexpr int imagval = c1.getImag();
Complex<int>c2{12, 13};
cout << c2 << endl;
c2++;
cout << c2 << endl;
return 0;
}
Since constexpr can only be evaluated at runtime they shouldn't be used with any operation that require change in the corresponding state of the current representation of the object of the given type such as using the corresponding assignment operator or increment operators, etc. Consequently, a constexpr function or a member function can only be treated as a pure function that just returns the correspnding constant literal value that can be evaluated at the compile time and shouldn't change the corresponding object in its scope. Hence a constexpr member function are const specified by default cause we cant change any object in the scope of such function either way. Therfore, in the above mentioned though we have declared the any member function that just returns a value without any change to the consequent value as constexpr, I have delibrately marked the post incremenent operator in the given type as a non-constexpr cause it can't be used as a constexpr either way.
Reference Arguments
On the first thought you might think with references and constexpr might revolves around the fact that since a reference is inherintly a runtime construct as it provide a linkage to an object for which it has been declared a reference to it can't really be used as a constexpr, but a constexpr is usually concerned with the value, i.e. wheter the specified value can be evaluated at runtime or not. Consequently, we can still use a const reference which is usually used to refer to tempory literals and doesn't change its values as a constexpr. However, using an lval or rval(a value that we deemed to change) as a constexpr wouldn't be that fesible. For instance lets consider the following example:
#include<iostream>
using namespace std;
template<class T>
class Complex{
private:
int real;
int imag;
public:
constexpr Complex(int real1 = {}, int imag1 = {}): real{real1}, imag{imag1}{};
constexpr Complex(float& imag1): real{}, imag{imag1}{};
constexpr Complex(const Complex& f1): real{f1.real}, imag{f1.imag}{};
constexpr float getReal(){return real;}
constexpr float getImag(){return imag;}
Complex& operator++(int sz){
real += sz;
imag += sz;
return *this;
}
friend ostream& operator<<(ostream& os, const Complex& c1){
os << c1.imag << " " << c1.real;
return os;
}
};
int main(){
int somevals = 12;
#ifdef Errors
constexpr int& lvalref = somevalues; //Error cant evaluate somevals at runtime.
constexpr int&& rvalref = 12; //error cant cant evaluated rval at runtime either.
#endif
constexpr Complex<int>cvals{12, 13};
constexpr Complex<int>c2{cvals};
cout << c2 << endl;
return 0;
}
The above declaration allows for a constexpr copy constructor using reference to the given object basecause we have passed in a const lval ref as an argument to the given constructor, meaning we aren't modifying any object in the given scope and are explicitly just initializing it.
Address of Constant Expression
The adress of any object of the given type are usually statically allocated and constant. However, rather than being specified by the constructor the addresses are often specified by the linker rather than the compiler. This in turns, puts a limit to the addresses that can be used as a constexpr. For Example:
#include<iostream>
using namespace std;
int main(){
constexpr const char* somevalues = "somevalues"; //OK can be evaluated at compile time.
int rvals = 12;
constexpr const char* somevalues1 = somevalues; //OK can be evaluated at compile time.
#ifdef Errors
constexpr int* somevalues2 = somevalues1 + 1; //error can evaluate the next pointed address at compile time as the compiler doesnt know the value of somevalues1.
#endif
return 0;
}
Implicit Type Conversion
Integral and floating types number corresponds to the same class of types in CPP which is often categorized as arthematic types. Consequently, they can be freely assigned to each other, however, since an integral type is used to represent only whole integral number, we can't really assign an integral value with a floating point value without any subsequent truncation or loss of precision. This type of conversion between values of two same types is thus, often not value preserving. A conversion from a type T to a type T1 is said to be vaue preserving if for a type T, the value in the corresponding type can be represented exactly as the same with the type t1. Usually for any given expression we should avoid such unfortunate conversion, however, though a compiler might warn you about such conversion, it is often not garunteed but still some runtime check can also be enforced if need to prevent such conversions or one can optionally always use the list-based intialization to prevent any narrowing amongst converting type. With that said a conversion that doesn't really truncate or loose any information is often known as simply a promotion and usually happens between a samller type to a larger realted type for instance a character type to an integral type, etc. Thus, such promotions are relatively much more safer than the unfortunate conversions. For Example lets consider the following:
#include<iostream>
#include<limits>
using namespace std;
template<class Source, class Target>
Target narrow_cast_check(Source s){
Target t = s;
if(t != s){
throw std::runtime_error{"narrowing conversion"};
}
return t;
}
void narrowing_conversions(){
char ch = 12; //unfortunate conversions.
int r = 12.3f; //narrowing conversion from float(12.3) to int(12), loose information
double d = std::numeric_limits<float>::max();
float f1 = d; //not narrowing though appers to be narrowing.
cout << (f1 == d ? "True" : "False") << endl;
}
void use_narrowing_conversions(){
int count{0};
try{
//int somevalues = narrow_cast_check<float, int>(12.3f);
//char somevalues1 = narrow_cast_check<int, char>(12); //seemingly looks narrowing conversion but not!
float f1 = narrow_cast_check<double, float>(12.3);
}catch(std::runtime_error& r1){
cerr << r1.what() << endl;
}
}
int main(){
narrowing_conversions();
use_narrowing_conversions();
return 0;
}
In the above example, except for the conversion between an a character literal to an integral literal which qualifies as a promotion rather than conversion, all of the said mentioned conversions are not value preserving and can lead truncation.
Pointer and Reference Conversion
- Any pointer to an object except for a pointer to a function or member can be implicitly converted to a void*, i.e. a pointer to unknow type.
- More so, no matter how dubious it may sound for the historical reason CPP allow intialization of zero integral literal value to a pointer to corresponds to a null pointer or an address with all bit zero pattern, not used by any object of given type. Consequently, an expression resulting in 0 can be implicitly assigned to a pointer, similarly, a constexpr resulting in 0 can also be assigned to a pointer.
int* ptr = 12 * 12 - 144;
Boolean Conversion
In arthematic conversion an integral value can be converted into boolean literal value where a non-zero value converts to true and a zero-value converts to false. Similarly, a boolean value can be converted into integral value where a truthy value converts to 1 and non-thruthy value converts to false. And since any arthematic operations are applicable on integral value any such operation while converting the given values into integral values are performed on the converted values. For instance lets consider the following:
#include<iostream>
using namespace std;
void Boolean_Conversions(){
bool a{true};
bool b = false;
bool c{12!=0};
bool d = 12;
cout << (a + b) << endl;
cout << (a - b) << endl;
cout << (a |= d) << endl;
}
int main(){
Boolean_Conversions();
return 0;
}
Similarly, a pointer can be implicitly converted to a boolean value where a non-null pointer converts to true and a null pointer converts to false. This notion can further be exploted to facilitate short hand expression evaluation when a pointer is being used as an operand to logical or bitwise operators. For instance lets connsider the following:
#include<iostream>
#include<cstring>
using namespace std;
void readString(const char* arr[], size_t N){
for(size_t i = 0; i < N; i++){
if(arr[i] && strlen(arr[i])){
cout << arr[i] << " ";
}
}
cout << endl;
}
int main(){
const char* somestring[]{"random", "age", "", nullptr, "random"};
readString(somestring, 5);
return 0;
}
In the predicate for the conditional statement the pointer value first implicitly converts to the boolean literal value thus we are only dereferencing the given pointer if it happens to be non-null pointer or in fact, points to an object in memory.
#include<iostream>
using namespace std;
int main(){
int* ptr = 0; //OK but wierd
return 0;
}
Unusual Arthematic Conversion
When performing conversion in subexpression between operands of different related type such as integral, or floating point type, the conversion are performed so as to evaluate teh expression with the largest operand type or the type that can hold the result of the subsequent expressions. Consequently, tough there are various rules for type conversion between related types in expression, but without bogging yourself into too much of details with those rule just considering the return type to be equivalent to the largest operand type can do in most of the cases. For instance lets consider the following example:
#include<iostream>
#include<limits>
using ll = long double;
using namespace std;
int main(){
ll values = 12;
unsigned int rr = std::numeric_limits<int>::max();
int v = 12;
auto res = 12.3f + 12 + values;
auto res1 = rr + v;
cout << typeid(res).name() << endl;
cout << (sizeof(13.3f + 12 + values) == sizeof(long double) ? "True" : "False") << endl;
cout << typeid(res1).name() << endl;
cout << (sizeof(res1) == sizeof(unsigned int) ? "True" : "False") << endl;
return 0;
}
This Blog was created with sole intent of teaching and learning programming as such, this is a sole ditch effort, so please forgive any mistakes senpais. For further suggestion please contact via Email in the footer! 😁