Unix Shell Scripting: Simple Filters

Head : Displaying the beginning of a file

Head prints the first N number of data of the given input. By default, it prints first 10 lines of each given file.

Syntax and Options

head [OPTIONS]… [FILE]…

Short Option Long Option Option Description
-c –bytes to print N bytes from each input file.
-n –lines to print N lines from each input file.
-q –silent, –quiet Prevent printing of header information that contains file name

-v –verbose to print header information always.

1. Print the first N number of lines

To view the first N number of lines, pass the file name as an argument with -n option as shown below.

$ head -n 5 flavours.txt
Ubuntu
Debian
Redhat
Gentoo
Fedora core

Note: When you simply pass the file name as an argument to head, it prints out the first 10 lines of the file.

2. Print N number of lines by specifying N with –

You don’t even need to pass the -n option as an argument, simply specify the N number of lines followed by ‘-‘ as shown below.

$ head -4 flavours.txt
Ubuntu
Debian
Redhat
Gentoo

3. Print all but not the last N lines

By placing ‘-‘ in front of the number with -n option, it prints all the lines of each file but not the last N lines as shown below,

$ head -n -5 flavours.txt
Ubuntu

4. Print the N number of bytes

You can use the -c option to print the N number of bytes from the initial part of file.

$ head -c 5 flavours.txt
Ubuntu
Note : As like -n option, here also you can pass ‘-‘ in front of number to print all bytes but not the last N bytes.

5. Passing Output of Other command to Head Input

You may pass the output of other commands to the head command via pipe as shown below,

$ ls | head
bin
boot
cdrom
dev
etc
home
initrd.img
lib
lost+found
media

TAIL:
tail outputs the last part, or "tail", of files.

Syntax

tail [OPTION]... [FILE]...

Description

tail prints the last 10 lines of each FILE to standard output. With more than one FILE, it precedes each set of output with a header giving the file name. If no FILE is specified, or if FILE is specified as a dash ("-"), tail reads from standard input.
Options

In the options listed below, arguments that are mandatory for long options are mandatory for short options as well:
-c, --bytes=K Output the last K bytes; alternatively, use "-c +K" to output bytes starting with the Kth byte of each file.
-f, --follow[={name|descriptor}] Output appended data as the file grows; -f, --follow, and --follow=descriptor are equivalent. If name is specified, the file with filename name will be followed, regardless of its file descriptor.
-F Same as "--follow=name --retry".
-n, --lines=K Output the last K lines, instead of the default of the last 10; alternatively, use "-n +K" to output lines starting with the Kth.
--max-unchanged-stats=N With --follow=name, reopen a FILE which has not changed size after N (default 5) iterations to see if it has been unlink'ed or renamed (this is the usual case of rotated log files).
--pid=PID With -f, terminate operation after process ID PID dies.
-q, --quiet, --silent Never output headers giving file names.
--retry Keep trying to open a file even when it is, or becomes, inaccessible; useful when following by name, i.e., with --follow=name.
-s, --sleep-interval=N With -f, sleep for approximately N seconds (default 1.0) between iterations. With --pid=P, check process P at least once every N seconds.
-v, --verbose Always output headers giving file names.
--help Display a help message, and exit.
--version Display version information, and exit.
Notes

If the first character of K (the number of bytes or lines) is a "+", tail prints the beginning with the Kth item from the start of each file; otherwise, tail prints the last K items in the file. K may have a multiplier suffix: b (512), kB (1000), K (1024), MB (1000*1000), M (1024*1024), GB (1000*1000*1000), G (1024*1024*1024), and so on for T (terabyte), P (petabyte), E (exabyte), Z (zettabyte), Y (yottabyte).

With --follow (-f), tail defaults to following the file descriptor, which means that even if a tail'ed file is renamed, tail will continue to track its end. This default behavior is not desirable when you really want to track the actual name of the file, not the file descriptor (for example, in a log rotation). Use --follow=name in that case. That causes tail to track the named file in a way that accommodates renaming, removal and creation.

Examples

tail myfile.txt

Outputs the last 10 lines of the file myfile.txt.

tail myfile.txt -n 100

Outputs the last 100 lines of the file myfile.txt.

tail -f myfile.txt

Outputs the last 10 lines of myfile.txt, and monitors myfile.txt for updates; tail then continues to output any new lines that are added to myfile.txt.

CUT: Extract Fields and Columns from a file

Unix Cut Command Example

We will see the usage of cut command by considering the below text file as an example

> cat file.txt
unix or linux os
is unix good os
is linux good os

1. Write a unix/linux cut command to print characters by position?

The cut command can be used to print characters in a line by specifying the position of the characters. To print the characters in a line, use the -c option in cut command

cut -c4 file.txt
x
u
l

The above cut command prints the fourth character in each line of the file. You can print more than one character at a time by specifying the character positions in a comma separated list as shown in the below example

cut -c4,6 file.txt
xo
ui
ln

This command prints the fourth and sixth character in each line.

2.Write a unix/linux cut command to print characters by range?

You can print a range of characters in a line by specifying the start and end position of the characters.

cut -c4-7 file.txt
x or
unix
linu

The above cut command prints the characters from fourth position to the seventh position in each line. To print the first six characters in a line, omit the start position and specify only the end position.

cut -c-6 file.txt
unix o
is uni
is lin

To print the characters from tenth position to the end, specify only the start position and omit the end position.

cut -c10- file.txt
inux os
ood os
good os

If you omit the start and end positions, then the cut command prints the entire line.

cut -c- file.txt

3.Write a unix/linux cut command to print the fields using the delimiter?

You can use the cut command just as awk command to extract the fields in a file using a delimiter. The -d option in cut command can be used to specify the delimiter and -f option is used to specify the field position.

cut -d' ' -f2 file.txt
or
unix
linux

This command prints the second field in each line by treating the space as delimiter. You can print more than one field by specifying the position of the fields in a comma delimited list.

cut -d' ' -f2,3 file.txt
or linux
unix good
linux good

The above command prints the second and third field in each line.

Note: If the delimiter you specified is not exists in the line, then the cut command prints the entire line. To suppress these lines use the -s option in cut command.

4. Write a unix/linux cut command to display range of fields?

You can print a range of fields by specifying the start and end position.

cut -d' ' -f1-3 file.txt

The above command prints the first, second and third fields. To print the first three fields, you can ignore the start position and specify only the end position.

cut -d' ' -f-3 file.txt

To print the fields from second fields to last field, you can omit the last field position.

cut -d' ' -f2- file.txt

5. Write a unix/linux cut command to display the first field from /etc/passwd file?

The /etc/passwd is a delimited file and the delimiter is a colon (:). The cut command to display the first field in /etc/passwd file is

cut -d':' -f1 /etc/passwd

6. The input file contains the below text

> cat filenames.txt
logfile.dat
sum.pl
add_int.sh

Using the cut command extract the portion after the dot.

First reverse the text in each line and then apply the command on it.

rev filenames.txt | cut -d'.' -f1

Paste
The paste command displays the corresponding lines of multiple files side-by-side.
Syntax

paste [OPTION]... [FILE]...
Description

paste writes lines consisting of the sequentially corresponding lines from each FILE, separated by tabs, to the standard output. With no FILE, or when FILE is a dash ("-"), paste reads from standard input.
Options

-d, --delimiters=LIST reuse characters from LIST instead of tabs.
-s, --serial paste one file at a time instead of in parallel.
--help Display a help message, and exit.
--version Display version information, and exit.
Examples

paste file1.txt file2.txt

Sort:

Sort command is helpful to sort/order lines in text files. You can sort the data in text file and display the output on the screen, or redirect it to a file. Based on your requirement, sort provides several command line options for sorting data in a text file.

Sort Command Syntax:

$ sort [-options]
For example, here is a test file:

$ cat test
zzz
sss
qqq
aaa
BBB
ddd
AAA
And, here is what you get when sort command is executed on this file without any option. It sorts lines in test file and displays sorted output.

$ sort test
aaa
AAA
BBB
ddd
qqq
sss
zzz
1. Perform Numeric Sort using -n option

If we want to sort on numeric value, then we can use -n or –numeric-sort option.

Create the following test file for this example:

$ cat test
22 zzz
33 sss
11 qqq
77 aaa
55 BBB
The following sort command sorts lines in test file on numeric value in first word of line and displays sorted output.

$ sort -n test
11 qqq
22 zzz
33 sss
55 BBB
77 aaa
2. Sort Human Readable Numbers using -h option

If we want to sort on human readable numbers (e.g., 2K 1M 1G), then we can use -h or –human-numeric-sort option.

Create the following test file for this example:

$ cat test
2K
2G
1K
6T
1T
1G
2M
The following sort command sorts human readable numbers (i.e 1K = 1 Thousand, 1M = 1 Million, 1G = 1 Giga, 1T = 1 Tera) in test file and displays sorted output.

$ sort -h test
1K
2K
2M
1G
2G
1T
6T
3. Sort Months of an Year using -M option

If we want to sort in the order of months of year, then we can use -M or –month-sort option.

Create the following test file for this example:

$ cat test
sept
aug
jan
oct
apr
feb
mar11
The following sort command sorts lines in test file as per month order. Note, lines in file should contain at least 3 character name of month name at start of line (e.g. jan, feb, mar). If we will give, ja for January or au for August, then sort command would not consider it as month name.

$ sort -M test
jan
feb
mar11
apr
aug
sept
oct
4. Check if Content is Already Sorted using -c option

If we want to check data in text file is sorted or not, then we can use -c or –check, –check=diagnose-first option.

Create the following test file for this example:

$ cat test
2
5
1
6
The following sort command checks whether text file data is sorted or not. If it is not, then it shows first occurrence with line number and disordered value.

$ sort -c test
sort: test:3: disorder: 1
5. Reverse the Output and Check for Uniqueness using -r and -u options

If we want to get sorted output in reverse order, then we can use -r or –reverse option. If file contains duplicate lines, then to get unique lines in sorted output, “-u” option can be used.

Create the following test file for this example:

$ cat test
5
2
2
1
4
4
The following sort command sorts lines in test file in reverse order and displays sorted output.

$ sort -r test
5
4
4
2
2
1
The following sort command sorts lines in test file in reverse order and removes duplicate lines from sorted output.

$ sort -r -u test
5
4
2
1
6. Selectively Sort the Content, Customize delimiter, Write output to a file using -k, -t, -o options

If we want to sort on the column or word position in lines of text file, then “-k” option can be used. If we each word in each line of file is separated by delimiter except ‘space’, then we can specify delimiter using “-t” option. We can get sorted output in any specified output file (using “-o” option) instead of displaying output on standard output.

Create the following test file for this example:

$ cat test
aa aa zz
aa aa ff
aa aa tt
aa aa kk
The following sort command sorts lines in test file on the 3rd word of each line and displays sorted output.

$ sort -k3 test
aa aa ff
aa aa kk
aa aa tt
aa aa zz
$ cat test
aa|5a|zz
aa|2a|ff
aa|1a|tt
aa|3a|kk
Here, several options are used altogether. In test file, words in each line are separated by delimiter ‘|’. It sorts lines in test file on the 2nd word of each line on the basis of numeric value and stores sorted output into specified output file.

$ sort -n -t'|' -k2 test -o outfile
The contents of output file are shown below.

$ cat outfile
aa|1a|tt
aa|2a|ff
aa|3a|kk

aa|5a|zz

Uniq
uniq reports or filters out repeated lines in a file.
Syntax

uniq [OPTION]... [INPUT [OUTPUT]]
Description

uniq filters out adjacent, matching lines from input file INPUT, writing the filtered data to output file OUTPUT.

If INPUT is not specified, uniq reads from the standard input.

If OUTPUT is not specified, uniq writes to the standard output.

If no options are specified, matching lines are merged to the first occurrence.
Options

-c, --count Prefix lines with a number representing how many times they occurred.
-d, --repeated Only print duplicated lines.
-D, --all-repeated[=delimit-method] Print all duplicate lines. delimit-method may be one of the following:

none Do not delimit duplicate lines at all. This is the default.
prepend Insert a blank line before each set of duplicated lines.
separate Insert a blank line between each set of dupliated lines.
The -D option is the same as specifying --all-repeated=none.
-f N, --skip-fields=N Avoid comparing the first N fields of a line before determining uniqueness. A field is a group of characters, delimited by whitespace.

This option is useful, for instance, if your document's lines are numbered, and you want to compare everything in the line except the line number. If the option -f 1 were specified, the adjacent lines

1 This is a line.
2 This is a line.
would be considered identical. If no -f option were specified, they would be considered unique.
-i, --ignore-case Normally, comparisons are case-sensitive. This option performs case-insensitive comparisons instead.
-s N, --skip-chars=N Avoid comparing the first N characters of each line when determining uniqueness. This is like the -f option, but it skips individual characters rather than fields.
-u, --unique Only print unique lines.
-z, --zero-terminated End lines with 0 byte (NULL), instead of a newline.
-w, --check-chars=N Compare no more than N characters in lines.
--help Display a help message and exit.
--version Output version information and exit.
Notes

uniq does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use sort -u instead of uniq.
Examples

Let's say we have an eight-line text file, myfile.txt, which contains the following text:
This is a line.
This is a line.
This is a line.

This is also a line.
This is also a line.

This is also also a line.
...Here are several ways to run uniq on this file, and the output it creates:
uniq myfile.txt

This is a line.

This is also a line.

This is also also a line.
uniq -c myfile.txt

3 This is a line.
1
2 This is also a line.
1
1 This is also also a line.
uniq -d myfile.txt

This is a line.
This is also a line.
uniq -u myfile.txt

This is also also a line.

Unix Shell Scripting

Friday, 22 January 2016

Simple Filters

No comments:

Post a Comment