Revision: $Revision: 1.12 $ ($Date: 2007-01-10 16:22:01 $)
Resources and further reading: PerlRef02; PerlRef03; PerlRef04; Till01; man pages for the various commands; perldoc documentation, e.g. perldoc perl, perldoc perlre.
Corresponding to common practice in Perl documentation, we will use Perl for the Perl language and perl for the program.
Perl is an acronym that stands for Practical Extraction and Report Language. It is a language optimized for scanning text files, extracting information from those text files and printing reports based on that information. It can also be used for many system-management tasks. The language is intended to be practical (easy to use, efficient, complete), rather than beautiful (tiny, elegant, minimal). Perl combines features of C, sed, awk and sh. If you have a problem that you would ordinarily solve using sed, awk or sh, but it exceeds their capabilities or must run a little faster, and you can't or don't want to write in C, then Perl may be for you. There are also translators to turn your sed and awk scripts into Perl scripts. When you need powerful Regular Expression support, Perl will probably be the best option.
Perl's expression syntax corresponds closely to C expression syntax. Perl does not limit the size of your data, the only restriction is the size of your memory. Recursion is of unlimited depth. Tables used by hashes (previously called associative arrays) grow as necessary to prevent degraded performance. In Perl, you can use sophisticated pattern-matching techniques to scan large amounts of data quickly. Although optimized for scanning text, perl can also deal with binary data, and can make dbm files look like hashes. Setuid and setgid perl scripts are safer than C programs through a data-flow-tracing mechanism (taintmode) that closes a number of security holes.
Perl is an interpreted language, which means that there is no explicit compilation step. The perl processor reads its input file, converts it to an internal form and executes it immediately. Internally, however, perl compiles the program and only executes it if the compilation was successful.
A number of interesting features are:
The type of a variable (scalar, array or hash) is indicated by a prefix character.
No need to define the size of strings or of arrays before using them.
A very rich set of pattern-matching operations that make it very easy to locate and process patterns of text.
A complete set of arithmetic capabilities.
A very complete set of built-in functions.
Perl is a very powerful and rich language. It goes far beyond
the scope of this book to try to teach you all about it. The objectives require that
you know just enough about Perl to enable you to write
simple Perl scripts. Therefore,
we will cover the basics and leave it up to you to explore further. It is
assumed the reader has at least a nodding acquaintance with at least one programming
language, preferably C or awk, and therefore
knows concepts like variables, arrays, branching, looping, control statements
and the like. After all, you are an LPIC-1 alumnus
;-)
perl will use standard input by default and look there for statements it should execute. Thus, to execute Perl statements, you could start up the perl program just by typing in its name[18]:
$ perl
The perl interpreter starts up, and silently waits for input. It is possible to type in a number of Perl statements next. Every command in Perl should end in a semicolon; to forget one is a common error, both for experienced and novice users. Thus, a Perl program could look like this:
print "hello world\n";
exit 0;
To make perl interpret these statements, terminate the input by typing Ctrl+D on the next line. Now perl will check the statements you typed, check for errors and execute them. If all went well, you will see the following on your screen:
hello world
Most of the time, you will want to execute a program that consists of statements in a file. It is possible to start up perl, and tell it to execute statements from within a file:
$ perlperlfile
All script files (shell, Perl) are handled the same way:
statements are contained in files which have the execution bit set
(chmod +x). If the first line of an script contains a
hash and an exclamation mark -- #! -- as its first two
characters, this denotes that the location of an interpreter follows. The
shell will start that interpreter and feed the script to it. For example:
#!/usr/bin/perlperl statements
Everything after the first line is Perl speak. Lines beginning with a
hash (#) are comments. Remember that statements in
Perl should end in a semicolon. Thus, a
Perl script could look something like this:
#!/usr/bin/perl
statement1;
# a comment line
statement2;
statement3; # end of line comment
Perl statements can also be put in a block. A block consists of a opening curly brace, some statements and a closing curly brace. For instance:
if ($x > 5)
{
statement5;
statement6;
}
The block is used in several places: each if, for
example, must be followed
by a block and statements in a subroutine definition will be contained in
a block.
Once you have written a Perl program this way, and have set the execution bit on it, you can simply run it by typing the name of the file. Again, perl will check your code first and generate error messages if it finds mistakes.
In many of our examples, we will use the built-in print function.
print takes a list containing one or more arguments
and prints
them. If a file handle (file handles) is placed between the print
keyword and the arguments,
the arguments are sent to the corresponding file. By default, standard
output will be used. The print function does not add
newlines or other formatting.
Formatting can be added by including escape characters, or you can use the
printf function. C programmers will feel at ease
with this function right away. The first string argument to
printf is a
format string, which is a mixture of normal characters and format directives.
Normal characters are simply output as is. Format
directives begin with a “%” and
are instructions on how to output the next argument from the list of arguments
following the format string. Format directives may contain width and precision
specification, instructions for left or right justification, and type of data.
Please see the Perl documentation (start with perldoc
perl.
An example:
$v = 10;
$h = 33.6;
printf( "The area of a %f x %f rectangle is %10.3f\n", $v, $h, $v * $h );
will print:
The area of a 10 x 33.6 rectangle is 336.000
You can store all types of data in variables. An example:
#!/usr/bin/perl $x = 1; $y = 2; $z = 3; print "$x + $y = $z\n";
Any variables in quotes will be evaluated before the
print prints out the results. So, running this
program results in the following output:
1 + 2 = 3
By default, all variables in Perl are global
variables; that is, they are accessible from every part of the
program. You can create private variables called lexical variables at any
time with the my operator. Their scope is restricted
to the block or file they are declared within.
A block is denoted by a set of curly braces
(the block that defines a subroutine or an “if” statement or a
specially created block):
{
my($x, $y); # private variables for this block
}
You can force yourself to use only lexical variables with the following pragma:
use strict;
When using this pragma, unwanted use of a global variable will be caught. For instance:
#!/usr/bin/perl -w
use strict;
$verse = "Some text";
This results in a compile-time error. To fix this, use
my $verse to make $verse a local
variable (this time at file level)
or use the use vars ... pragma to define
$verse as a known global variable.
In Perl, the first character of a variable name or variable reference denotes what kind of value the variable may hold. Here are the most common kinds of variables:
The above table names three different variables. In
other words, variable $x is a different variable than
@x.
A Perl scalar variable can be an integer, floating point or a character string. Integer literals are written without a decimal point or an “E” exponent. Examples of integer literals are:
12354 -179 0x32 # hexadecimal 027 # octal
The third number in the list is a hexadecimal (base 16) constant; its value is 50 decimal. The fourth number in the list is an octal (base 8) constant because it begins with a “0”; its value is 23 decimal.
Floating point literal numbers are written with a single decimal point and/or an “E” exponent. Example floating point literals are:
12.3 -6.23E27 3E7 .1
String literals are enclosed in either single or double quotes:
"The world\n"
Another example:
'no expansions'
Single-quoted strings.
In a single-quoted string no expansion will be done. In the following the
$ is just a dollar sign
and \n are the two characters backslash and
n:
print 'Some $verse here \n';
The output will be:
Some $verse here \n
Commands between backticks (back quotes) and escape sequences will not not expanded.
Double-quoted strings.
Material in double-quoted strings is subject to expansion.
Among these are variables and escape sequences. An escape sequence
is a combination of a backslash and character(s). The combination will
have a special meaning. The newline character, for example, will be
written as the \n combination in a double-quoted
string:
my $verse = 'hallo there';
print "Some $verse here\n";
This will result in
Some hallo there here
The last character of the output is a newline.
If you want $, @ or
% to appear literally in a string
delimited by double quotes, escape them with a
single backslash.
Another example is the double-quote character itself: to get a
double quote in a string surrounded by double quotes, use the
\ for escaping the double-quote.
Table 13.10, “Escape characters in Perl” shows a list of well-known escape
sequences.
For example, suppose we had assigned $t =
'test'.
Then:
$x = '$t'
would assign the literal string $t to $x,
where
$x = "$t"
would assign the literal string test to $x.
Some example string literals are:
"Hello World!" 'A bit longer but still a nonsensical string' '' ""
When a variable contains a scalar literal, its name always begins with a “$”. An example:
$length = 12.3; $width = 17; $area = $length * $width; # $area should be 209.1
In the above, the value of $length was a
floating-point number, the value of $width was an
integer. The result of the calculation would be floating point.
An example using strings:
$colour = "blue"; $shade = "dark"; $description = $shade . " " . $colour
Here, the scalar variable $description will have the
value “dark blue” ('.' is the string-concatenation operator).
An array is a singly dimensioned vector of scalar quantities. Individual elements
of the array are accessed by a small integer index (or subscript). Normally, an
array containing “n” elements has indices starting at
0 and going to n - 1, inclusive.
In contrast to many other programming languages, it is possible to have array
literals (sometimes called lists). Examples of array literals are:
() # the empty list, or array (1,2,3) # an array of three integers (1,'fred',27.1,3*5)
The last example is of any array containing 4 elements. The second element (index==1) is a string and the fourth element (index==3) is an integer of value 15.
When a variable references an array of scalars, its name always begins with a “@”. For example:
@one = (1, 2, 3);
@two = (4, 5, 6);
@both = (@one, @two);
At the end of this, @both will be an array of six
elements. The expression “(@one, @two)”,
in this context, effectively joins the two smaller arrays end-to-end.
the fact is, that to reference one of the elements of an array, whose name starts with “@”, you are referring to a scalar datum -- and thus the name must begin with “$”.
You also must enclose the subscript or index in square brackets following the name. For example:
@vec = (3,4,5);
$sum = $vec[0] + $vec[2];
$vec[1] = $sum;
At the end of this, $sum will have the value of 8,
and the array @vec will have the value (3,8,5).
For every array @foo, there is a scalar variable
$#foo that gives the index-id of the last element
of the array. So, after execution of:
@vec = (5,10,15);
The value of $#vec is 2.
A more common way to do this is by using the automatic conversion from array to scalar. When an array is assigned to a scalar the scalar will get the number of elements in the array. The two following lines do exactly this:
my $size = @somearray;
my $size = scalar @somearray;
The scalar command makes the conversion even more
visible.
Perl does not support multi-dimensional arrays. From a technical point of view, there is no need for multi-dimensional arrays - memory can be regarded as a one-dimensional array of bytes - but often the multi-dimensional notational closer matches reality. Think, for example, of programming a board game where it would be much more convenient to think in terms of “rows” and “columns”, instead of terms like “offset counting from square one”.
There are a number of ways to simulate multi-dimensional arrays using
Perl. One of them uses references.
A reference is a scalar that refers to an entire array, list or
other variable type. To create a reference to a variable, you put a backslash
in front of it. So, \@a denotes a reference to the
array a. A reference is a scalar and therefore can be
stored as an element of an array or list. Thus, it is possible to store
a
reference to an entire array in one element of another array. In the
ASCII art below, an example is given.
Each line represents an array (from left to right @a,
from top to bottom @b
and from front to back @c).
A dash denotes an element in the array. The “X”, for example, denotes
element 4 in array @c
(remember, we start counting at zero).
0 - - 3 - - - - - - a
| c
| /
| (X) $c[4]
| / $b[7]->[4]
| / $a[3]->[7]->[4]
|/
7
|
|
b
To find the element (X) you can use three methods:
The direct notation $c[4]: Denotes that (X)
is element #4 of @c, or
Using the notation $b[7]->[4]: Start at array @b,
which denotes to use element #7 in array @b, which is a reference to array @c,
and take the 4th element to find (X), or
Using the notation $a[3]->[7]->[4]:
Start at array @a, take element #3, which is a
reference to array @b, of which element #7 is a reference
to array @c, of which you take element #4.
When a variable references a hash of scalars, its name
always begins with a “%”. For example:
%box = ("len", 100, "height", 40, "width", 20, "colour", "blue");
This can also be written as follows:
%box = ("len" => 100, "height" => 40, "width", 20, "colour" => "blue");
Both assign a hash with four elements to variable %box.
The indices (or keys) of the elements are the strings "len", "height",
"width" and "colour" and the values of those elements are 100, 40, 20,
and "blue", respectively.
When we refer to elements, we are referring to scalar data and thus the variable
reference must start with “$”.
The index delimiters of '{}' denote that we are accessing a hash (just as the index delimiters '[]' denote access to an array). In our example, to get the height, use:
print "height is ", $box{'height'};
The output will be
height is 40
$box{'volume'} = $box{'width'} * $box{'length'} * $box{'height'};
This assigns a fifth element to hash %box.
Perl has the ability to use user-defined subroutines or functions. The
name of such a subroutine consists of letters, digits and
underscores, but can't start with a digit. It is defined by the
sub keyword, followed by a name and a block:
my $n = 0; # file-wide variable $n
...
sub marine
{
$n += 1;
print "Hello, sailor number $n!\n";
}
Subroutine definitions can be anywhere in your program text. Subroutine definitions are global; without some powerful trickiness, there are no private subroutines. If you have two subroutine definitions with the same name, the later one overwrites the earlier one.
Invoke a subroutine from within any expression by using the subroutine name, sometimes preceded with an ampersand:
&marine; # says Hello, sailor number 1!
&marine; # says Hello, sailor number 2!
&marine; # says Hello, sailor number 3!
&marine; # says Hello, sailor number 4!
The initial ampersand is often optional. It is mandatory, however, if you are defining a function of your own that has the same name as one of the built-in functions of Perl (this is a bad idea anyway):
sub chomp
{
print "bit more than I could chew";
}
&chomp; # prevents calling the internal chomp function
You may omit the ampersand if there is no built-in with the same name and if the definition of the function precedes the calling of the function:
sub multiply
{
$_[0] * $_[1];
}
...
my $mult = multiply 355, 113;
A special array @_ contains the arguments passed to
the subroutine on entrance. It is a good idea to save the arguments in
a local variable:
# show elements of array by index
sub showarray
{
my @list = @_;
for (my $i = 0; $i < @list; $i++)
{
print "index $i value $list[$i]\n";
}
}
This is how the subroutine is called:
my @rows = (24,30,12,34);
...
showarray @rows;
The output is:
index 0 value 24
index 1 value 30
index 2 value 12
index 3 value 34
When passing argument(s) to a subroutine, you actually pass an array. In fact, you are passing one array. When you want to pass separate entities (e.g., two arrays), you must pass references to these entities instead and dereference these in the subroutine itself. This is left as an exercise to the reader.
The result of the last executed statement of a subroutine will be the return value of the subroutine. This subroutine will return the sum of the first two arguments:
sub add
{
$_[0] + $_[1];
}
You can also make the subroutine return by using the
return keyword:
sub divide
{
if ($_[1] == 0)
{
return 0;
}
# otherwise: return the division
$_[0] / $_[1];
}
Operators allow a combination of constants and variables. There are many common operators, however we will only discuss the most important ones. The precedence of the operations are largely intuitive. Precedence of operations can be overridden by the use of parentheses. When in doubt, use parentheses to prevent ambiguity as a last resort. Check the Perl syntax first.
Numerical operators.
Numerical operators expect numbers as arguments and return numbers as
results. The
basic numeric operators are: + - * / **,
(the last one means exponentiation). Additionally, these operators may
be combined with the “=” equal sign, to form C-style operators such as
+= -= *= **=:
$n = 8; $n **= 3; # same as: $n = $n ** 3; print $n; # 512 (8 * 8 * 8)..
Among the built-in numeric functions are: cos,
sin, exp,
log and sqrt.
Additionally, the increment-by-one (++) and
decrement-by-one (--) operators are used in Perl in
the same way as they are used in C.
As in the C language, their placement is of importance:
$m = 1;
print $m++ . " "; # print value first, than increment..
print $m . " "; # print incremented value ..
print ++$m . " "; # increment first, then print new value..
print $m . " \n"; # print value;
This would result in the printing of the range: 1 2 3 3
numeric operators and functions expect numeric arguments. If they are given a string argument, they convert to numeric automatically.
String concatenation operator.
To add strings, use the string concatenation operator:
. (a dot).
It denotes string concatenation:
$i = 'hi';
$o = 'ho';
$song = $i . $o;
Variable $song will now contain
hiho.
Boolean operators. There is no special data type that signifies Boolean values. There are just values that mean false: the zero or null values. More specifically, they are:
"" null string
"0" string
0 integer
0.0 float
() empty array or list
A variable can be set to undef, meaning that it
is completely undefined.
Everything that is not zero, null, empty or undef will mean true.
There are a number of operators that explicitly produce Boolean results:
these always produce 0 to indicate false and 1 to indicate true. The most
common are:
|| or
&& and
! not
The first two are short-circuit operators; the right
operand is not
evaluated if the answer can be determinded solely from the left operand.
The “not” operator ! reverses the value
of its only operand: true will become false, vice versa.
Numeric Comparisons. For comparing numbers, you can use:
< less
<= less or equal
== equals
!= not equal
>= bigger or equal
> bigger
If you compare strings using these numeric comparison operators, the strings are first converted to numeric quantities and the numeric quantities compared.
String Comparisons. The commonly used ones:
lt lesser than
le lesser or equal
eq equals (identical)
ne not equal
ge greater or equal
gt greater than
The operators eq and ne
are used for direct string comparisons. The others are used mainly in
sorting.
To perform more sophisticated string comparisons in Perl,
use pattern matching.
Perl permits the use of powerful pattern-matching
operators: =~ and
!~ for does match
and does not match respectively.
Regular Expressions are probably the most important feature of Perl. Regular Expressions in general are described in the section called “Regular Expressions”. Be sure to read the part about Perl Regular Expression extensions (see the section called “Perl Regular Expressions”). This section shows how to apply Regular Expressions in Perl.
To match a Regular Expression against a variable, a statement must be created consisting of
the variable: $some
the binding operator =~
a Regular Expression enclosed in slashes: /^[A-Z]/
A comparison statement can look as follows:
$some =~ /^[A-Z]/
This is not complete: the result of the comparison must be handled as
well.
In the following example the Boolean result of the above statement is
evaluated (with an if statement). The result is
either true or
false.
If true (if the contents of
$some starts with an
uppercase letter), the block is entered:
if ($some =~ /^[A-Z]/)
{
# match: do something
}
If both variable and binding operator are omitted,
the comparison is done against the contents of the so-called
default variable $_:
if (/^[A-Z]/)
{
# RE matches on $_: do something
}
Of course, if the Regular Expression matches, pseudo variables
like $&, $1, etc. can be
inspected, as described in the section called “Perl Regular Expressions”.
There is also the reverse of the binding operator:
the !~. Its result is
true if the Regular Expression does
not match the contents of the variable.
In the example below, the block is entered
when the contents of $some does not start with an
uppercase letter:
if ($some !~ /^[A-Z]/)
{
# if no match: do something
}
When matching against $_ the not operator
! can be used instead:
if (!/^[A-Z]/)
{
# RE does not match on $_
}
Another way to check if a variable matches is to inspect the results of the match. This will only work if grouping (see the section called “Perl Regular Expressions”) is used. The following is an example:
my @results = $input =~ /^([A-Z])([a-z]+)/;
if (@results)
{
# RE matched!
print "1st part: $results[0], 2nd part $results[1]\n";
}
Perl programs may need to refer to specific input
or output streams or files. The variables that name them are, by
convention, always rendered in uppercase. Output files can be explicitly
created by the open function. For example, to
open a file named prog.dat you
would use:
open(OUT, ">prog.dat") || die "$0: open prog.dat for writing failed: $!";
This associates the file handle OUT with the
file. The leading “>” signifies that the file will
be emptied (if it did exist) or created. Use
“>>”
to instead append to the file.
You must check if the open succeeds and catch possible errors.
One of the possibilities is the use of die as
shown above; $! will contain the error.
Now, you can use the file handle to write something to the file:
print OUT ("A test");
When using print or printf
and a handle do not put a comma after the
handle.
This will open and empty the file and will write
“A test”
to it. There is no comma between the file handle and the first
string argument.
Standard input is opened automatically and can be referenced via the
predefined handle called
“STDOUT”. STDOUT will
be used by
printf and print if no
file handle was specified. Another automatically opened handle is
STDERR (standard error).
A file can be closed using the close keyword:
close(OUT);
File handles can also be used when reading from a file.
my $file = 'listOfNames';
open(IN, "<$file") || die "$0: unable to open $file for reading";
The initial < can be omitted.
To read from the file, use the file handle surrounded by
< and >. There is, however,
something to watch out for: scalar versus array context.
If the handle is read in scalar context, exactly one line is read:
my $line = <IN>;
When this is used again, the next line is read and so on. In array context however, all lines will be read at once. The following, for example, will do exactly that:
my @allLines = <IN>;
The method of reading input line-by-line is often used in combination
with a while keyword:
while (<IN>)
{
# most recent line read in $_
print $_;
}
This example reads each line and prints it, until no more lines can be read.
Don't forget to close the handles again:
close(IN);
A predefined file handle is STDIN for
standard input. It is opened for reading.
It can be used like any other handle.
A commonly used construction is <>.
It operates as follows:
In applications it might be used like this:
while (<>)
{
...statement(s)...
}
This will read either from named files (given as arguments) or from
standard input.
While inside the while loop, a special predefined
variable $ARGV contains the name of the file that
is being read (standard input is indicated by a
dash (-)).
A very powerful Perl facility is the process handle. It works exactly like a file handle but refers to a process instead. Process handles can be opened for reading or for writing.
The following code will open a WHO handle
that is associated with a running who program.
open (WHO, "who|") || die "cannot start who: $!";
my @whoLines = <WHO>;
close (WHO);
print "There are currently ", scalar @whoLines, " users logged in\n";
The | symbol indicates that this is a process to be
opened. By placing the | at the end, we tell Perl that
we want to read from the process.
Of course, there are also writable process handles. A typical application might be writing a message (collected earlier) to a mail program:
open (MAIL, "|mutt -s \"log report\" $mailto") ||
die "cannot start mutt: $!";
print MAIL @messageLines;
close (MAIL);
This time the | is at the beginning, indicating that
the handle should be open for writing.
The message is sent after the close statement
has been processed.
Perl supports a number of loop constructs. Let's study the following example, in which we also recapitulate a number of the other things we've learned so far. The numbers in the left margin are for reference purposes only and do not belong to the code.
/----------------------------------------------------------------
|
1 | open ( PW, "</etc/passwd") || die "Eh..? Can't read /etc/passwd?";
2 | while ( @a = split (':', <PW>) ) {
3 | foreach $x ( @a ) { print $x . " "; }
4 | print "\n";
5 | }
6 | close( PW );
|
In line 1, the main logic functionality is determined by a
boolean expression: the || ('or'
operator) with two operands.
The operands are functions in this case, but that is
all perfectly legal in the Perl language. Remember that
booleans act as short-circuit operators: the right operand is not
evaluated if the answer can be determined solely from the left operand.
Therefore, if the open succeeds, the die
function will never be called. When open fails, the
first part of the boolean expression is false and, therefore, the second
part of the expression should be parsed (the die
function will be called) and the script will terminate. For now, let's
assume that the function opened the password file
successfully. The file handle PW is now associated with it.
Line 2 contains a “while” loop. A
“while”
statement uses a test expression as its argument and executes its block
every time the expression evaluates to true. A test expression is evaluated
before every iteration, so the block may get executed
0 times. The core of the test expression is the split
function. The split will take a string (its second
argument) and split it into chunks, delimited by one of the characters as
specified in the first argument (in our case, a colon). In our example,
split fetches the input string directly from the file
using the file handle “PW”. The test
expression evaluates to true as long as the array
“@a” is filled by the
split,
which is the case as long as split is able to read
from the file.
Line 3 gives an example of the foreach loop. The
foreach loop provides a handy way to cycle through
all elements of an array in order. It takes 2 arguments: the variable
where an element of the array should be put in and the name of the
array within parentheses. The loop cycles through all values in the array
(effectively: all fields in the line from the password file) and stashes
each field in $x, one field per iteration. The value
is printed next.
Line 4 just prints a newline after each line that was read and parsed from the password file.
Line 5 terminates the “while” loop and
Line 6 closes the file associated with the file handle “PW”.
Another example further clarifies the use of the foreach
loop and introduces the range operator (three dots in
a row) and the concept of lists:
@a = ("a"..."z","A"..."Z");
foreach $n (@a) {
print $n;
}
In the first line, the array @a is filled with
the elements of a list. A list is simply a listing of values. In this
case, the list is formed by specifying two ranges,
namely the range of all letters from "a" to "z" and the second range
of all letters from "A" to "Z". The array @a will
contain 52 elements. The foreach loops through
all values in the array, one per iteration, and prints the resulting
value to standard output (since we did not specify a
file handle).
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
The for loop is the third type of loop. It is
identical to the C “for” construct. The general structure is:
for(init;test;incr)block
A for statement can easily be rewritten using
while:
my $i; my $i = 0;
for($i = 0; $i < 20; $i++) while ($i < 20)
{ {
print "\$i: $i\n"; print "\$i: $i\n";
$i++;
} }
The test is performed before every iteration, so the block may be executed 0 times. Here, for example, is a way to print a table of squares, using the printf function:
for($n = 1; $n <= 10; ++$n)
{
printf("%d squared is %d\n", $n, $n * $n);
}
Perl supports the if statement:
if( $le < 10.0 )
{
print("Length $le is too small\n");
}
Each test expression must be followed by a statement block. Blocks do not have to end with a semicolon. The following is functionally equivalent to our example above:
print( "Length $le is too small!\n" ) if ($le < 10.0);
There are “else” and “elsif” keywords too:
if( $le < 10.0 )
{
print( "Length $le is too small!\n" );
}
elsif( $le > 100.0 )
{
print( "Length $le is too big!\n" );
}
else
{
print( "Length is just right!\n" );
}
Note that if, elsif and
else must be followed by a block,
even for one statement.
You can even use short-circuit boolean operators, as in:
$l < 10.0 && print("Length $le is too small!\n");
This kind of expression is particularly useful when you have to take an exceptional action after something (such as opening an input file) fails:
open(IN,"<foo.dat") || die("Unable to open foo.dat: $!");
written equivalently, but less readably, as:
die("Unable to open foo.dat: $!") if !open(IN,"<foo.dat");
perl automatically enables a set of special security checks, called taint mode, when it detects its program running with differing real and effective user or group IDs. In other words: when a program has either its setuid or setgid bit set (or both)[19].
You can also enable taint mode explicitly by using the
-T command line flag:
#!/usr/bin/perl -w -T
Th -T flag is strongly
suggested for server programs and any program run on behalf of someone
else, such as CGI scripts. Once taint mode is on, it's on for the
remainder of your script.
While in this mode, perl takes special precautions, called taint checks, to prevent both obvious and subtle traps. Some of these checks are reasonably simple, such as verifying that path directories aren't writable by others; careful programmers have always used checks like these. Other checks, however, are best supported by the language itself. As a result of these checks set?id perl programs are more secure, than similar C programs.
Using taint mode does not take the responsibility away from the programmer. It just helps the programmer. For instance, if a program allows some user input from a web page, perl tests if the program checks this data, regardless of the quality of the test.
You may not use data derived from outside your program to affect
something else outside your program -- at least, not by accident.
All command-line arguments, environment variables, locale information
(see the perllocale manual page), results of certain system
calls (readdir(), readlink(),
the variable of shmread(), the messages returned by
msgrcv(), the password,
gcos and shell fields returned by the
getpwxxx() calls), and all file input are marked as
“tainted”.
Tainted data may not be used directly or indirectly in any command that invokes a subshell, nor in any command that modifies files, directories or processes.
If you pass a list of arguments to either system or
exec, the elements of that list are NOT checked for
taintedness. For example, assuming $arg is tainted, this
is allowed (alas):
exec 'sh', '-c', $arg; # Considered secure... :-(
The programmer knows whether or not to trust $arg
and, so, whether to take appropriate action.
Any variable set to a value derived from tainted data will itself be tainted, even if it is logically impossible for the tainted data to alter the variable. Because taintedness can be associated with particular scalar values, some elements of an array can be tainted and others not.
Perl scripts often use Perl modules. Great many Perl modules are freely available. A module provides a way to package Perl code for reuse. Many modules support object-oriented concepts. Modules are, in fact, packages that follow some strict conventions. Therefore, packages are explained first.
A package is a set of related Perl subroutines in its own namespace. Perl modules are often implemented in an Object Oriented way, thereby hiding the nitty-gritty details of the implementation of the module and presenting the user of the module with a small, clear interface.
A package starts with the package statement. The
following example is the first line of CGI.pm:
package CGI;
This sets the namespace in CGI.pm to
CGI.
When you write a Perl script, you will work in the default
namespace called main. You can
switch to
another namespace by using another package statement.
A module is a package designed for reuse. However, modules are contained
in files ending in the .pm extension (perl module)
and are located in one of the Perl library
directories. To make use of a module, the use keyword
is used with the name of the module as argument:
use CGI;
If the module name contains two adjacent colons, the first part indicates
a subdirectory. Thus, File::Basename refers to a module
file called Basename.pm in the subdirectory named
File in (one of) the library
directories.
The predefined array @INC will list the directories
where perl will start looking for modules.
For instance, if directory /usr/lib/perl5
is in @INC, then perl will look for
/usr/lib/perl5/File/Basename.pm when looking for the
File::Basename module.
The @INC can be easily extended. For instance,
to use a locally developed module in the directory
/home/you/lib, use:
push(@INC,'/home/you/lib');
The perl -I flag can also be used:
#!/usr/bin/perl -w -I/home/you/lib
To make perl use a Perl module use the
use keyword:
use File::Basename; # Look for File/Basename.pm.
You can nest deeper than a single directory. Just replace each double-colon with a directory separator.
There are hundreds of free modules available on the Comprehensive Perl Archive Network, or CPAN, a set of Internet servers located throughout the world. It consists of about 100 sites that archive the same content. Many countries have at least one CPAN mirror, and their number is growing. Available modules include support for access to Oracle and other databases (DBI); networking protocols such as HTTP, POP3 and FTP and support for CGI.
CPAN offers various options to search for modules by author(s), category, name or date. Once you have found something interesting, you can download the module. You will have a compressed tarfile that contains the archive. You'll need to decompress the tarfile and untar it. Next, you can issue the commands:
# perl Makefile.PL
# make
# make test
If the test is successful, you can issue a make install
to install the module. Make sure you have the appropriate permissions to install the
module in your Perl 5 library directory. Often, you'll need to be root.
That's all you need to do on Unix systems with dynamic linking[20].
Andreas Königs' CPAN
module (CPAN.pm) is designed to automate the make
and install of Perl modules and extensions. It
includes some searching capabilities and knows a number of methods to fetch the raw
data from the Net. Modules are fetched from one or more of the mirrored
CPAN sites and unpacked in a dedicated directory.
Most Perl installations already have this module pre-installed, but
if not, you may want to download and install it first. Additionally, to allow you
to perform extended searches for modules, there is another module available:
CPAN::WAIT. It's a full-text search engine that indexes all
documents available in CPAN authors' directories.
CPAN::WAIT uses a special protocol that resembles NNTP. CPAN::WAIT tries
to access so-called “wait servers” and
can do so using either a direct connection or over an http proxy.
The CPAN module normally is operated from within an interactive
shell. That shell will allow additional search commands if the
CPAN::WAIT module has been installed.
The initial configuration will be started automatically when the shell is started for the first time. To start the CPAN shell, type:
# perl -MCPAN -e shell;
The first time you start up this command, it will create some administrative files in which your preferences are kept. You can either choose to configure manually or let the module guess at the proper values (which often works out fine). Manual configuration will ask questions about the amount of disk space to use, the locations of some external programs the module wants to use, whether or not module dependencies should be resolved automatically, the location of your proxy servers (if any) and possible extra parameters you'd want to specify for make, etc. Also, the module will try several methods (if necessary) of gaining access to a CPAN site and may ask you for your location and your favorite wait server.
Perl can be easily invoked from the command line.
perl -e'perl commands'
An easy way to find the hexadecimal value of a decimal number, for example, is:
perl -e 'printf "%x\n", 26;'
The output will be:
1a
The -n flag is very helpful on the command line.
Suppose you want to use something like this:
while (<>)
{
print " :$_";
}
Everything but the print statement can be
eliminated by using the -n option:
perl -ne 'print " :$_";' inputs
This example will read lines from the file
inputs.
For each line that was read, output will be printed
consisting of four spaces, a colon (:) and
the original input line (expanded from $_).
[18] assuming perl was properly installed
and the command occurs in your PATH
[19] The setuid bit in Unix permissions is mode 04000, the setgid bit mode 02000
[20] On systems that have a statically-linked perl (and the module requires compilation), you'll need to build a new Perl binary that includes the module