C programming language |
The C programming language is a standardized imperative programming computer programming programming language developed in the early 1970s by Ken Thompson and Dennis Ritchie for use on the UNIX operating system. It has since spread to many other operating systems, and is one of the most widely used programming languages. C is prized for its efficiency, and is the most popular programming language for writing system software, though it is also used for writing Application softwares. It is also commonly used in computer science education, despite not being designed for novices.
= Features =
== Overview ==
C is a relatively minimalist .
C was created with one important goal in mind: to make it easier to write large programs with fewer errors in the procedural programming paradigm, but without encumbering the writer of the C Compiler by complex language features.
To this end, C has the following important features:
Some features that C lacks that are found in other languages include:
One consequence of C s wide acceptance and efficiency is that the compilers, libraries, and interpreters of other higher-level languages are often implemented in C.
== hello, world example ==
The following simple application appeared in the first edition of The C Programming Language (book), and has become a standard introductory program in most programming textbooks, regardless of language. The program prints out Hello world program to standard output, which is usually a terminal or screen display. However, it might be a file or some other hardware device, including the bit bucket, depending on how standard output is mapped at the time the program is executed.
main() { printf( hello, world ); }
The above program will compile correctly on most modern compilers that are not in compliance mode. However, it produces several warning messages when compiled with a compiler that conforms to the ANSI C standard. Additionally, the code will not compile if the compiler strictly conforms to the C programming language#C99 2 standard, as a return value of type int will no longer be assumed if the source code has not specified otherwise. These messages can be eliminated with a few minor modifications to the original program:
#include
int main(void) { printf( hello, world );
return 0; }
What follows is a line-by-line analysis of the above program:
#include This first line of the program is a preprocessing directive, #include. This causes the preprocessor — the first tool to examine source code when it is compiled — to substitute for that line the entire text of the file or other entity to which it refers. In this case, the header stdio.h — which contains the definitions of standard input and output functions — will replace that line. The angle brackets surrounding stdio.h indicate that stdio.h can be found using an implementation-defined search strategy. Double quotes may also be used for headers, thus allowing the implementation to supply (up to) two strategies. Typically, angle brackets are used for headers supplied by the implementation, and double quotes for "in-house" headers.
int main(void) This next line indicates that a function named main is being defined. The main function (programming) function serves a special purpose in C programs. When they are executed, main() is the first function called. The portion of the code that reads int indicates that the return value — the value to which the main function will evaluate — is an integer. The portion that reads (void) indicates that the main function takes no arguments. See also Void return type.
{ This opening curly brace indicates the beginning of the definition of the main function.
printf( hello, world ); This line calls — looks up and then executes the code for — a function named printf, which was declared in the included header stdio.h. In this call, the printf function is passed — provided with — a single argument, the address of the first character in the string literal hello, world . The sequence that reads is an escape sequence that is translated to the EOL—or end-of-line—character, which is intended to move the output device s current position indicator to the beginning of the next line. The return value of the printf function is of type int, but no use was made of it so it will be quietly discarded.
return 0; This line terminates the execution of the main function and causes it to return the integral value 0.
} This closing curly brace indicates the end of the code for the main function.
If the above code were compiled, it would do the following:
*Print the string hello, world onto the standard output device (typically but by no means always a terminal), *Move the current position indicator to the beginning of the next line, *Then return the integer zero to the application s executor.
==Types==
C has a type system similar to that of other ALGOL descendants such as Pascal programming language, although different in a number of ways. There are types for integers of various sizes, both signed and unsigned, floating-point numbers, characters, enumerated types (enum), record (computer science) (struct), and untagged union (computer science)s (union).
C makes extensive use of Pointers, a very simple type of reference (computer science) that stores the address of a memory location. Pointers can be dereferenced to retrieve the data stored at that address. The address can be manipulated with regular assignment and pointer arithmetic. At runtime, a pointer represents a memory address. At compile-time, it is a complex type that represents both the address and the type of the data. This allows expressions including pointers to be type-checked. Pointers are used for many different purposes in C. Text strings are commonly represented with a pointer to an array of characters. Dynamic memory allocation, which is described below, is performed using pointers.
A null pointer has a reserved value indicating that it points to no valid location. These are useful for indicating special cases such as the next pointer in the final node of a linked list. Dereferencing a null pointer causes unpredictable behavior. Pointers to type void also exist, and point to objects of unknown type. These are particularly useful for generic programming. Since the size and type of the objects they point to is not known they cannot be dereferenced, but they can be converted to other types of pointers.
Array types in C are of a fixed, static size known at compile-time; this isn t too much of a hindrance in practice, since one can allocate blocks of memory at runtime using the standard library and treat them like arrays. Unlike many other languages, C typically represents arrays just as it does pointers: as a memory address with associated data type. In this case, index values are translated into memory addresses by computing an offset from the base address of the array. The array index is not checked against the array bounds, which can result in illegal memory accesses. This may reveal confidential data, corrupt data, or cause run-time errors or exceptions, depending on the situation and the detailed run time environment.
C also supplies multi-dimensional arrays. The index values of the arrays are assigned in row-major order. Semantically these arrays function like arrays of arrays, but physically they are stored as a single one-dimensional array with computed offsets.
C is often used in low-level systems programming, where it may be necessary to treat an integer as a memory address, a double-precision value as an integer, or one type of pointer as another. For such cases C provides casting , which forces the explicit conversion of a value from one type to another. The use of casts sacrifices some of the safety normally provided by the type system.
== Data storage ==
One of the most important functions of a programming language is to provide facilities for managing computer memory and the objects that are stored in memory. C provides three distinct ways to allocate memory for objects:
These three approaches are appropriate in different situations and have various tradeoffs. For example, static memory allocation has no allocation overhead, automatic allocation has a small amount of overhead during initialization, and dynamic memory allocation can potentially have a great deal of overhead for both allocation and deallocation. On the other hand, stack space is typically much more limited than either static memory or heap space, and only dynamic memory allocation allows allocation of objects whose size is only known at run-time. Most C programs make extensive use of all three.
Where possible, automatic or static allocation is usually preferred because the storage is managed by the compiler, freeing the programmer of the error-prone hassle of manually allocating and releasing storage. Unfortunately, many data structures can grow in size at runtime; since automatic and static allocations must have a fixed size at compile-time, there are many situations in which dynamic allocation must be used. Variable-sized arrays are a common example of this (see malloc for an example of dynamically allocated arrays).
== Syntax ==
Main article: C syntax
Unlike languages like Fortran 77, C is free-form, allowing programmers to use arbitrary whitespace (rather than rigid lines) in laying out their code. Comments can be included either between the delimiters /* and */, or (in C99) following // until the end of the line.
Each source file contains declarations and function definitions. Function definitions, in turn, contain declarations and statements. Declarations either define new types using keywords such as struct, union, and enum, or assign types to and reserve storage for new variables, usually by writing the type followed by the variable name. Keywords such as char and int, as well as the pointer-to symbol *, specify built-in types. Sections of code are enclosed in braces ({ and }) to indicate the extent to which declarations and control structures apply.
As an imperative language, C depends on statements to do most of the work. Most statements are expression statements which simply cause an expression to be evaluated -- and, in the process, cause variables to receive new values or values to be printed. Control-flow statements are also available for conditional or iterative execution, constructed with reserved keywords such as if, else, switch, do, while, and for. Arbitrary jumps are possible with goto. A variety of built-in operators perform primitive arithmetic, logical, comparative, bitwise, and array indexing operations and assignment. Expressions can also call functions, including a large number of standard library functions, for performing many common tasks.
= Criticism =
A popular saying, repeated by such notable language designers as addresses some of these problems.
Part of the reason for this is to avoid compile- and run-time checks that were too expensive when C was originally designed. Another reason is the desire to keep C as efficient and flexible as possible; the more powerful a language, the more difficult it is to prove things about programs written in it. Some checks were also relegated to external tools, such as those discussed in Compiler-external static-checking tools below.
== Memory allocation ==
One problem with C is that automatically and dynamically allocated objects are not initialized; they initially have whatever value is present in the memory space they are assigned. This value is highly unpredictable, and can vary between two machines, two program runs, or even two calls to the same function. If the program attempts to use such an uninitialized value, the results are usually unpredictable. Many modern compilers try to detect and warn about this problem, but both false positives and false negatives occur.
Another common problem is that heap memory cannot be reused until it is explicitly released by the programmer with free(). The result is that if the programmer accidentally forgets to free memory, but continues to allocate it, more and more memory will be consumed over time. This is called a memory leak . Conversely, it is possible to release memory too soon, and then continue to use it. Because the allocation system can reuse the memory at any time for unrelated reasons, this results in insidiously unpredictable behavior. These issues in particular are ameliorated in languages with garbage collection (computer science).
== Pointers ==
Pointers are one primary source of danger; because they are unchecked, a pointer can be made to point to any object of any type, including code, and then written to, causing unpredictable effects. Although most pointers point to safe places, they can be moved to unsafe places using pointer arithmetic, the memory they point to may be deallocated and reused (dangling pointers), they may be uninitialized (wild pointers), or they may be directly assigned any value using a cast or through another corrupt pointer. Another problem with pointers is that C freely allows conversion between any two pointer types. Other languages attempt to address these problems by using more restrictive reference (computer science) types.
== Arrays ==
Although C has native support for static arrays, it does not verify that array indexes are valid (bounds checking). For example, one can write to the sixth element of an array with five elements, yielding generally undesirable results. This is called a buffer overflow . This has been notorious as the source of a number of security problems in C-based programs. On the other hand, since bounds checking elimination technology was largely nonexistent when C was defined, bounds checking came with a severe performance penalty, particularly in numerical computation. It was also believed to be inconsistent with C s minimalist approach.
Multidimensional arrays are necessary in numerical algorithms (mainly from applied linear algebra) to store matrices. The structure of the C array is very well adapted and fit for this particular task, provided one is prepared to count one s indices from 0 instead of 1. This issue is discussed in the book Numerical Recipes in C , Chap. 1.2, page 20 ff ([http://www.library.cornell.edu/nr/bookcpdf/c1-2.pdf read online]). In that book there is also a solution based on negative addressing which introduces other dangers.
== Variadic functions ==
Yet another common problem are variadic functions, which take a variable number of arguments. Unlike other prototyped C functions, checking the arguments of variadic functions at compile-time is not mandated by the standard, and is impossible in general without additional information. If the wrong type of data is passed, the effect is unpredictable, and often fatal. Variadic functions also handle null pointer constants in an unexpected way. For example, the printf family of functions supplied by the standard library, used to generate formatted text output, is notorious for its error-prone variadic interface, which relies on a format string to specify the number and type of trailing arguments.
Type-checking of variadic functions from the standard library is a quality of implementation issue, however, and many modern compilers do in particular type-check printf calls, producing warnings if the argument list is inconsistent with the format string. However, not all printf calls can be checked statically, since the format string can be built at runtime, and other variadic functions typically remain unchecked.
== Syntax ==
Although mimicked by many languages because of its widespread familiarity, C s syntax has been often targeted as one of its weakest points. For example, Kernighan and Ritchie say in the second edition of The C Programming Language , C, like any other language, has its blemishes. Some of the operators have the wrong precedence; some parts of the syntax could be better. Bjarne Stroustrup has also derided C++ s syntax, which is very similar to that of C: Within C++, there is a much smaller and cleaner language struggling to get out. [...] the C++ semantics is much cleaner than its syntax. [http://www.research.att.com/~bs/bs_faq.html] Some specific problems worth noting are:
== Maintenance problems ==
There are other problems in C that don t directly result in bugs or errors, but make it harder for inexperienced programmers to build a robust, maintainable, large-scale system. Examples of these include:
== Compiler-external static-checking tools ==
Tools have been created to help C programmers avoid these errors in many cases.
Automated source code checking and auditing is fruitful in any language, and for C many such tools exist such as lint programming tool. A common practice is to use Lint to detect questionable code when a program is first written. Once a program passes Lint, it is then compiled using the C compiler.
There are also compilers, libraries and operating system level mechanisms for performing array bounds checking, buffer overflow detection and garbage collection (computer science), that are not a standard part of C.
Cproto is a program that will read a C source file and output prototypes of all the functions within the source file. This program can be used in conjuction with the make command to create new files containing prototypes each time the source file has been changed. These prototype files can be included by the original source file (e.g., as filename.p ), which reduces the problems of keeping function definitions and source files in agreement.
It should be recognized that these tools are not a panacea. Because of C s flexibility, some types of errors involving misuse of variadic functions, out-of-bound array indexing, and incorrect memory management cannot be detected on some architectures without incurring a significant performance penalty. However, some common cases can be recognized and accounted for.
= History =
== Early developments ==
The initial development of C occurred at AT&T Bell Labs between 1969 and 1973; according to Ritchie, the most creative period occurred in 1972. It was named C because many of its features were derived from an earlier language called B programming language . Accounts differ regarding the origins of the name B : Ken Thompson credits the BCPL programming language, but he had also created a language called Bon (programming language) in honor of his wife Bonnie.
There are many legends as to the origin of C and its related operating system, Unix, including:
By 1973, the C language had become powerful enough that most of the UNIX kernel (computers), originally written in PDP-11/20 assembly language, was rewritten in C. This was one of the first operating system kernels implemented in a language other than assembly, earlier instances being the Multics system (written in PL/I programming language), TRIPOS (written in BCPL), and MCP (Master Control Program) for Burroughs B5000 written in ALGOL in 1961.
== K&R C ==
In 1978, Ritchie and Brian Kernighan published the first edition of The C Programming Language (book) . This book, known to C programmers as K&R , served for many years as an informal specification of the language. The version of C that it describes is commonly referred to as K&R C. (The second edition of the book covers the later ANSI C standard, described below.)
K&R introduced the following features to the language:
K&R C is often considered the most basic part of the language that is necessary for a C compiler to support. For many years, even after the introduction of ANSI C, it was considered the lowest common denominator that C programmers stuck to when maximum portability was desired, since not all compilers were updated to fully support ANSI C, and reasonably well-written K&R C code is also legal ANSI C.
In these early versions of C, only functions that returned a non-integer value needed to be declared before use. A function used without any previous declaration was assumed to return an integer. Example call requiring previous declaration:
long int SomeFunction();
int CallingFunction() { long int ret; ret = SomeFunction(); } Example call not requiring previous declaration:
int SomeOtherFunction() { return 0; }
int CallingFunction() { int ret; ret = SomeOtherFunction(); }
Since the K&R prototype did not include any information about function arguments, function parameter type checking were not performed, although some compilers would issue a warning message if a function was called with the wrong number of arguments.
In the years following the publication of K&R C, several unofficial features were added to the language, supported by compilers from AT&T and some other vendors. These included:
== ANSI C and ISO C ==
During the late 1970s, C began to replace BASIC programming language as the leading microcomputer programming language. During the 1980s, it was adopted for use with the IBM PC, and its popularity began to increase significantly. At the same time, Bjarne Stroustrup and others at Bell Labs began work on adding object-oriented programming language constructs to C. The language they produced, called C plus plus, is now the most common application programming language on the Microsoft Windows operating system; C remains more popular in the Unix world. Another language developed around that time is Objective-C which also adds object oriented programming to C. While, now, not as popular as C++, it is used to develop Mac OS X s Cocoa (API) applications.
In 1983, the American National Standards Institute (ANSI) formed a committee, X3J11, to establish a standard specification of C. After a long and arduous process, the standard was completed in 1989 and ratified as ANSI X3.159-1989 Programming Language C . This version of the language is often referred to as ANSI C, or sometimes C89 (to distinguish it from C99).
In 1990, the ANSI C standard (with a few minor modifications) was adopted by the International Organization for Standardization (ISO) as ISO/IEC 9899:1990. This version is sometimes called C90. Therefore, the terms C89 and C90 refer to essentially the same language.
One of the aims of the ANSI C standardization process was to produce a superset of K&R C, incorporating many of the unofficial features subsequently introduced. However, the standards committee also included several new features, such as function prototypes (borrowed from C++), and a more capable preprocessor.
ANSI C is now supported by almost all the widely used compilers. Most of the C code being written nowadays is based on ANSI C. Any program written only in standard C is guaranteed to perform correctly on any system platform with a conforming C implementation. However, many programs have been written that will only compile on a certain platform, or with a certain compiler, due to (i) the use of non-standard libraries, such as for Graphical user interface, and (ii) some compilers not adhering to the ANSI C standard, or its successor, in their default mode, or (iii) reliance on the exact size of certain datatypes as well as on the endianness of the platform.
You can use the __STDC__ macro to split your code into ANSI and K&R sections.
#if __STDC__ extern int getopt(int,char * const *,const char *); #else extern int getopt(); #endif
Some suggest using #if __STDC__ , like above, over #ifdef __STDC__ because some compilers set __STDC__ to zero to indicate non-ANSI compliance.
== C99 ==
After the ANSI standardization process, the C language specification remained relatively static for some time, whereas . This standard is commonly referred to as C99 . It was adopted as an ANSI standard in March 2000.
The new features in C99 include:
Interest in supporting the new C99 features appears to be mixed. Whereas GNU Compiler Collection and several other compilers now support most of the new features of C99, the compilers maintained by Microsoft and Borland do not, and these two companies do not seem to be interested in adding such support.
= Relation to C++ =
The C plus plus programming language was originally derived from C. However, not every C program is a valid C++ program. As C and C++ have evolved independently, there has been an increase in the number of incompatibilities between the two languages [http://david.tribble.com/text/cdiffs.htm]. The latest revision of C, C99, created a number of additional conflicting features. The differences make it hard to write programs and libraries that are compiled and function correctly as either C or C++ code, and confuse those who program in both languages. The disparity also makes it hard for either language to adopt features from the other one.
Bjarne Stroustrup, the creator of C++, has repeatedly suggested [http://www.research.att.com/~bs/sibling_rivalry.pdf] that the incompatibilities between C and C++ should be reduced as much as possible in order to maximize inter-operability between the two languages. Others have argued that since C and C++ are two different languages, compatibility between them is useful but not vital; according to this camp, efforts to reduce incompatibility should not hinder attempts to improve each language in isolation.
Today, the primary differences (as opposed to the additions of C++, such as classes, templates, namespaces, overloading) between the two languages are:
C99 adopted some features that first appeared in C++. Among them are:
= Intermediate language =
C is used as an intermediate language by some high-level languages (Eiffel programming language, Sather, Esterel) which do not output object file or machine language code, but output C source code only, to submit to a C compiler, which then outputs finished object or machine code. This is done to gain portability and Optimization (computer science). C compilers, often many, exist for most or all processors and operating systems, and most C compilers output well optimized object or machine code. Thus, any language that outputs C source code suddenly becomes very portable, and able to yield optimized object or machine code. Unfortunately, C is designed as a programming language, not as a compiler target language, so is not ideal for use as an intermediate language, leading to development of C-based intermediate languages, such as C--.
= See also =
*C preprocessor *C standard library *C library *C string *C syntax *C variable types and declarations *List of articles with C programs *Objective-C *C plus plus *Operators in C and C Plus Plus
*Pascal and C= References =
= External links =
;C
;C99
;Forums
|
|