awk(9)
Command: awk - pattern matching language
Syntax: awk rules [file] ...
Flags: (none)
Examples: awk rules input # Process input according to rules
awk rules - >out # Input from terminal, output to out
AWK is a programming language devised by Aho, Weinberger, and
Kernighan at Bell Labs (hence the name). Awk programs search files for
specific patterns and performs 'actions' for every occurrence of these
patterns. The patterns can be 'regular expressions' as used in the ed
editor. The actions are expressed using a subset of the C language.
The patterns and actions are usually placed in a 'rules' file whose
name must be the first argument in the command line, preceded by the
flag -f. Otherwise, the first argument on the command line is taken to
be a string containing the rules themselves. All other arguments are
taken to be the names of text files on which the rules are to be
applied, with - being the standard input. To take rules from the
standard input, use -f -.
The command:
awk rules prog.d*u
would read the patterns and actions rules from the file rules and apply
them to all the arguments.
The general format of a rules file is:
<pattern> { <action> } <pattern> { <action> } ...
There may be any number of these <pattern> { <action> } sequences in the
rules file. Awk reads a line of input from the current input file and
applies every <pattern> { <action> } in sequence to the line.
If the <pattern> corresponding to any { <action> } is missing, the
action is applied to every line of input. The default { <action> } is
to print the matched input line.
Patterns
The <pattern>s may consist of any valid C expression. If the
Actions
Actions are expressed as a subset of the C language. All variables
are global and default to int's if not formally declared. Only char's
and int's and pointers and arrays of char and int are allowed. Awk
allows only decimal integer constants to be used----no hex (0xnn) or
octal (0nn). String and character constants may contain all of the
special C escapes (\n, \r, etc.).
Awk supports the 'if', 'else', 'while' and 'break' flow of control
constructs, which behave exactly as in C.
Also supported are the following unary and binary operators, listed
in order from highest to lowest precedence:
Operator Type Associativity
() [] unary left to right
! ~ ++ -- - * & unary right to left
* / % binary left to right
+ - binary left to right
<< >> binary left to right
< <= > >= binary left to right
== != binary left to right
& binary left to right
^ binary left to right
| binary left to right
&& binary left to right
|| binary left to right
= binary right to left
Comments are introduced by a '#' symbol and are terminated by the first
newline character. The standard '/*' and '*/' comment delimiters are
not supported and will result in a syntax error.
Fields
When awk reads a line from the current input file, the record is
automatically separated into 'fields.' A field is simply a string of
consecutive characters delimited by either the beginning or end of line,
or a 'field separator' character. Initially, the field separators are
the space and tab character. The special unary operator '$' is used to
reference one of the fields in the current input record (line). The
fields are numbered sequentially starting at 1. The expression '$0'
references the entire input line.
Similarly, the 'record separator' is used to determine the end of
an input 'line,' initially the newline character. The field and record
separators may be changed programatically by one of the actions and will
remain in effect until changed again.
Multiple (up to 10) field separators are allowed at a time, but
only one record separator.
Fields behave exactly like strings; and can be used in the same
context as a character array. These 'arrays' can be considered to have
been declared as:
char ($n)[ 128 ];
In other words, they are 128 bytes long. Notice that the parentheses
are necessary because the operators [] and $ associate from right to
left; without them, the statement would have parsed as:
char $(1[ 128 ]);
which is obviously ridiculous.
If the contents of one of these field arrays is altered, the '$0'
field will reflect this change. For example, this expression:
*$4 = 'A';
will change the first character of the fourth field to an upper- case
letter 'A'. Then, when the following input line:
120 PRINT "Name address Zip"
is processed, it would be printed as:
120 PRINT "Name Address Zip"
Fields may also be modified with the strcpy() function (see below). For
example, the expression:
strcpy( $4, "Addr." );
applied to the same line above would yield:
120 PRINT "Name Addr. Zip"
Predefined Variables
The following variables are pre-defined:
FS Field separator (see below).
RS Record separator (see below also).
NF Number of fields in current input record (line).
NR Number of records processed thus far.
FILENAME Name of current input file.
BEGIN A special <pattern> that matches the beginning of
input text.
END A special <pattern> that matches the end of input
text.
Awk also provides some useful built-in functions for string manipulation
and printing:
print(arg) Simple printing of strings only, terminated by '\n'.
printf(arg...) Exactly the printf() function from C.
getline() Reads the next record and returns 0 on end of file.
nextfile() Closes the current input file and begins processing
the next file
strlen(s) Returns the length of its string argument.
strcpy(s,t) Copies the string 't' to the string 's'.
strcmp(s,t) Compares the 's' to 't' and returns 0 if they match.
toupper(c) Returns its character argument converted to upper-
case.
tolower(c) Returns its character argument converted to lower-
case.
match(s,@re@) Compares the string 's' to the regular expression 're'
and returns the number of matches found (zero if
none).
Authors
Awk was written by Saeko Hirabauashi and Kouichi Hirabayashi.