This episode begins to emphasize some basic regex and POSIX character classes examples. We give a simple table of POSIX character classes here so you will practice them easier for this episode. Both of them will make every GNU sed job much more easier. This is the sixth episode, so if you don't want to miss anything, we recommend you to read the first until the fifth episodes. Happy practicing!
POSIX Character Classes
Regex is
standardized in POSIX standard. And POSIX has character classes,
certain names for certain sets of characters. By using character
classes, you avoid using confusing backslashes or slashes, you avoid
using too many characters, and also they are easier to understand.
These are some of POSIX character classes:
`[:alpha:]` =
uppercase and lowercase, a-z and A-Z. Same with [[:upper:][:lower:]]
or [a-zA-Z].
`[:alnum:]` =
digits 0-9 and uppercase and lowercase. Same with
[[:alpha:][:digit:]] or [a-zA-Z0-9].
`[:digit:]` =
digits 0-9. Same as [0-9].
`[:punct:]` =
punctuation such as ,.:;!?-
`[:upper:]` =
uppercase letters. Same as [A-Z].
`[:lower:]` =
lowercase letters. Same as [a-z].
`[:space:]` =
whitespace characters.
`[:blank:]` =
space and tab only.
See https://en.wikibooks.org/wiki/Regular_Expressions/POSIX-Extended_Regular_Expressions for more detailed information.
Text Examples
text14.txt:
text15.txt:
51. Print Only Lines Between Two Patterns
Command Examples:
-
sed -n '/gnu/,/dunix/p' text15.txt
- sed -n '/dunix/,/sunos/Ip' text15.txt
Output Examples:
(1)
master@master:/tmp$
sed -n '/gnu/,/dunix/p' text15.txt
gnu ,.
ULTRIX : ;
1 xenix
2 minix
3 dunix
master@master:/tmp$
(2)
master@master:/tmp$
sed -n '/dunix/,/sunos/Ip' text15.txt
3 dunix
4 5 6 7 8
this is tab
yes, this is tab
HP-UX ; OS X ; SUNOS
master@master:/tmp$
Explanation:
The first command prints only the lines of text between the
line containing `gnu` string until the line containing `dunix` line.
This is the two pattern spaces addressing for ‘p’ command. The
option `-n` makes ‘p’ command to print only the matched results.
The second command does the same, it prints only the lines
starting from the line containing “dunix” string until the line
containing “sunos” string. But it makes use again the ‘I’
(case-insensitive) so this `sunos` (lowercase) will match the “SUNOS”
string (uppercase). Just think about Google Search for example.
52. Delete Only Comment Lines Between Address Range
Command Examples:
-
sed '10,19{/\/\*/,/\*\//d}' text14.txt
- sed
'10,19{/\/\//d}' text14.txt
Output Examples:
(1)
master@master:/tmp$
sed '10,19{/\/\*/,/\*\//d}' text14.txt
/* this is a free
software
this software is
licensed as GNU General Public License v2
*/
int main()
{
printf("hello\n");
// this is another
style of comment line to be deleted
printf("new
hello\n");
return 0;
}
master@master:/tmp$
(2)
master@master:/tmp$
sed '10,19{/\/\//d}' text14.txt
* this is a free
software
this software is
licensed as GNU General Public License v2
*/
int main()
{
printf("hello\n");
/*
this is a new
block of comment lines
to be deleted by
sed
with the address
range and delete commands
*/
printf("new
hello\n");
return 0;
}
master@master:/tmp$
Explanation:
This example is basically the same with the example number 41, 42,
and 43 in Episode 5. But it is actually a refinement for the specific
line range. You must notice the usage of a pair of brackets (`{}`) to
surround the regex of ‘d’ command, and notice the
`[begin_number],[end_number]` address range. This kind of
addressing is extremely useful. The general syntax is
‘[begin_number],[end_number]{[sed_command]}’
one of the benefits of this line addressing is you are allowed to
make complex ‘d’ command, for specific lines range.
The first command deletes slash-asterisk style comments only
for line number 10 until 19. That’s why the slash-asterisk in the
line 1 until 5 is not deleted.
The second command deletes the same range of lines (from 10
until 19) but does it for double slash style commenting.
53. Delete Only Comment Lines Between Two Patterns
Command Examples:
-
sed '/printf/,/printf/{/\/\*/,/\*\//d}' text14.txt
- sed '/^/,/main/{/\/\*/,/\*\//d}' text14.txt
Output Examples:
(1)
master@master:/tmp$ sed '/printf/,/printf/{/\/\*/,/\*\//d}'
text14.txt
/* this is a free software
this software licensed as GNU General Public License v2
*/
int main()
{
printf("hello\n");
// this is another style of comment line to be deleted
printf("new hello\n");
return 0;
}
master@master:/tmp$
(2)
master@master:/tmp$ sed '/^/,/int main()/{/\/\*/,/\*\//d}' text14.txt
int main()
{
printf("hello\n");
// this is another style of comment line to be deleted
printf("new hello\n");
return 0;
}
master@master:/tmp$
Explanation:
This is the regex address range demonstration. This is
basically the same with the example number 52 except the regex
address range. We use //,// before the {} block of command, to do the
command inside the {} only for an address specified by //.
The first command deletes slash-asterisk style comments
between the very beginning of the first line until sed finds the “int
main()” string. Hence, sed deletes only the first block of
slash-asterisk comments. It doesn’t delete the second block below
the the “int main()”.
The second command does the same except the regex range is
between the first “printf” string until the second “printf”
string. So, only slash-asterisk comments between two printf strings
deleted here.
54. Edit Only Matched Lines Between Address Range
Command Example:
sed '4,7{s/x/[X]/Ig}' text15.txt
Output Example:
master@master:/tmp$
sed '4,7{s/x/[X]/Ig}' text15.txt
unix ?
bsd !
gnu ,.
ULTRIX : ;
1 [X]eni[X]
2 mini[X]
3 duni[X]
4 5 6 7 8
this is tab
yes, this is tab
HP-UX ; OS X ; SUNOS
master@master:/tmp$
Explanation:
This is exactly the same with the example numbre 51, 52, and 53
except the ‘s’ command. This command does the substitution (‘s’)
only for the determined address (from the line 4 until 7). See
the result, the letter “x” in the strings “unix”, “ULTRIX”,
“HP-UX, ”and “OS X” don’t change. Only the selected lines
has changed.
55. Edit All Uppercase Characters (POSIX Character Class)
Command Example:
sed
's/[[:upper:]]/[X]/g' text15.txt
Output Example:
master@master:/tmp$
sed 's/[[:upper:]]/[X]/g' text15.txt
unix ?
bsd !
gnu ,.
[X][X][X][X][X][X] :
;
1 xenix
2 minix
3 dunix
4 5 6 7 8
this is tab
yes, this is tab
[X][X]-[X][X] ;
[X][X] [X] ; [X][X][X][X][X]
master@master:/tmp$
Explanation:
This command makes use of `[:upper:]` character class. So this
command edits only the uppercase letters for the whole text.
56. Edit All Lowercase Characters (POSIX Character Class)
Command Example:
sed
's/[[:lower:]]/[X]/g' text15.txt
Output Example:
master@master:/tmp$
sed 's/[[:lower:]]/[X]/g' text15.txt
[X][X][X][X] ?
[X][X][X] !
[X][X][X] ,.
ULTRIX : ;
1 [X][X][X][X][X]
2 [X][X][X][X][X]
3 [X][X][X][X][X]
4 5 6 7 8
[X][X][X][X] [X][X]
[X][X][X]
[X][X][X],
[X][X][X][X] [X][X] [X][X][X]
HP-UX ; OS X ; SUNOS
master@master:/tmp$
Explanation:
This command is a reversal for the example number 55. This edits only
the lowercase letter for the whole text.
57. Edit All Punctuation Characters (POSIX Character Class)
Command Example:
sed 's/[[:punct:]]/[X]/g' text15.txt
Output Example:
master@master:/tmp$
sed 's/[[:punct:]]/[X]/g' text15.txt
unix [X]
bsd [X]
gnu [X][X]
ULTRIX [X] [X]
1 xenix
2 minix
3 dunix
4 5 6 7 8
this is tab
yes[X] this is tab
HP[X]UX [X] OS X [X]
SUNOS
master@master:/tmp$
Explanation:
This example makes use of `[:punct:]` character class. So, it edits every punctuations available in the whole text. You can see the output, because it is replacement command, we see `[X]` sequence replacing every single of comma, dot, colon & semicolon, dash, exclamation and question mark.
58. Edit All Uppercase & Lowercase Characters (POSIX Character Class)
Command Example:
sed
's/[[:alpha:]]/[X]/g' text15.txt
Output Example:
master@master:/tmp$
sed 's/[[:alpha:]]/[X]/g' text15.txt
[X][X][X][X] ?
[X][X][X] !
[X][X][X] ,.
[X][X][X][X][X][X] :
;
1 [X][X][X][X][X]
2 [X][X][X][X][X]
3 [X][X][X][X][X]
4 5 6 7 8
[X][X][X][X] [X][X]
[X][X][X]
[X][X][X],
[X][X][X][X] [X][X] [X][X][X]
[X][X]-[X][X] ;
[X][X] [X] ; [X][X][X][X][X]
master@master:/tmp$
Explanation:
This command makes
use of `[:alpha:]` character class. It means it edits both the
uppercase and lowercase letters. So the remaining characters are
digits and punctuation.
59. Edit All Numeric Characters (POSIX Character Class)
Command Example:
sed
's/[[:digit:]]/[X]/g' text15.txt
Output Example:
master@master:/tmp$
sed 's/[[:digit:]]/[X]/g' text15.txt
unix ?
bsd !
gnu ,.
ULTRIX : ;
[X] xenix
[X] minix
[X] dunix
[X] [X] [X] [X] [X]
this is tab
yes, this is tab
HP-UX ; OS X ; SUNOS
master@master:/tmp$
Explanation:
This example makes use of `[:digit:]` character class. It
edits only every number of the whole text.
60. Edit All Space & Tab Characters (POSIX Character Class)
Command Example:
sed
's/[[:blank:]]/[X]/g' text15.txt
Output Example:
master@master:/tmp$
sed 's/[[:blank:]]/[X]/g' text15.txt
unix[X]?
bsd[X]!
gnu[X],.
ULTRIX[X]:[X];
1[X]xenix
2[X]minix
3[X]dunix
4[X]5[X]6[X]7[X]8
[X]
[X]this[X]is[X]tab
[X]yes,[X]this[X]is[X]tab
HP-UX[X];[X]OS[X]X[X];[X]SUNOS
master@master:/tmp$
Explanation:
This example makes use of `[:blank:]` character class. So it
edits only the spaces here. Every space character has been changed
with `[X]` sequence.
61. Combine Multiple POSIX Character Classes
Command Example:
sed
's/[[:upper:][:digit:]]/[X]/g' text15.txt
Output Examples:
(1)
master@master:/tmp$
sed 's/[[:upper:][:digit:]]/[X]/g' text15.txt
unix ?
bsd !
gnu ,.
[X][X][X][X][X][X] :
;
[X] xenix
[X] minix
[X] dunix
[X] [X] [X] [X] [X]
this is tab
yes, this is tab
[X][X]-[X][X] ;
[X][X] [X] ; [X][X][X][X][X]
master@master:/tmp$
(2)
master@master:/tmp$
sed 's/[[:lower:][:punct:]]/[X]/g' text15.txt
[X][X][X][X] [X]
[X][X][X] [X]
[X][X][X] [X][X]
ULTRIX [X] [X]
1 [X][X][X][X][X]
2 [X][X][X][X][X]
3 [X][X][X][X][X]
4 5 6 7 8
[X][X][X][X] [X][X]
[X][X][X]
[X][X][X][X]
[X][X][X][X] [X][X] [X][X][X]
HP[X]UX [X] OS X [X]
SUNOS
master@master:/tmp$
Explanation:
This example demonstrate how to use more than one character class.
You must put them inside a square brackets pair (`[]`). The
combination of `[:upper:][:digit:]` will match only uppercase
letters and digit characters. The combination of `[:lower:][:punct:]`
will match only lowercase letters and punctuation characters.