home PYTHONJAVA
 

Ruby regular expression

Regular Expression is a special sequence of characters that match or find a collection of strings by using a pattern with specialized syntax.

Regular expressions use a combination of specific characters defined in advance and combinations of these specific characters to form a "rule string" that is used to express a filtering logic for strings.

syntax

Regular expressions are literally a pattern between slashes or between any separators following %r, as follows:

/pattern/ /pattern/im # Regular expressions using delimiters can specify options %r!/usr/local! # Regular expression using delimiters
...

Instance

#!/usr/bin/ruby line1 = "Cats are smarter than dogs"; line2 = "Dogs also like meat"; if ( line1 =~ /Cats(.*)/ ) puts "Line1 contains Cats" end if ( line2 =~ /Cats(.*)/ ) puts "Line2 contains Dogs" end

The above example runs the output as:

Line1 contains Cats

Regular Expression Modifiers

Regular expressions may literally contain an optional modifier that controls all aspects of the match. The modifier is specified after the second slash character, as shown in the example above. The subscripts list the possible modifiers:

ModifiersDescription
iIgnore case when matching text.
o Interpolation is only performed once #{}, and the regular expression is evaluated the first time.
xIgnore spaces, allowing whitespace and comments to be placed throughout the expression.
m matches multiple lines and recognizes newline characters as normal characters.
u,e,s,n Interprets regular expressions as Unicode (UTF-8), EUC, SJIS, or ASCII. If no modifier is specified, the regular expression is considered to be using the source encoding.

Just as strings are separated by %Q, Ruby allows you to start with %r as a regular expression followed by any separator. This is useful when describing a large number of slash characters that you don't want to escape.

# The following matches a single slash character, not escaping< Span class="hl-comment"> %r|/ | # Flag characters can be matched by the following syntax %r[</(. *)>]i

Regular Expression Mode

In addition to the control characters, (+ ? . * ^ $ ( ) [ ] { } | \), all other characters match themselves. You can escape control characters by placing a backslash before the control character.

The following table lists the regular expression syntax available in Ruby.

ModeDescription
^ matches the beginning of the line.
$ matches the end of the line.
. matches any single character except a newline. It can also match newline characters when using the m option.
[...] matches any single character in square brackets.
[^...] matches any single character that is not in square brackets.
re* matches the previous subexpression zero or more times.
re+ matches the previous subexpression one or more times.
re? matches the previous subexpression zero or one time.
re{ n} matches the previous subexpression n times.
re{ n,} matches the previous subexpression n times or more.
re{ n, m} matches the preceding subexpression at least n times up to m times.
a| b matches a or b.
(re) Group regular expressions and remember the matching text.
(?imx) Temporarily turns on the i, m, or x options in the regular expression. If it is in parentheses, it only affects the parts inside the parentheses.
(?-imx) Temporarily turn off the i, m, or x options in the regular expression. If it is in parentheses, it only affects the parts inside the parentheses.
(?: re) Groups regular expressions, but does not remember matching text.
(?imx: re)Turn on the i, m, or x options in parentheses.
(?-imx: re) Temporarily turn off the i, m, or x options in parentheses.
(?#...)Note.
(?= re)Use the mode to specify the location. No scope.
(?! re)Use the negative designation of the mode. No scope.
(?> re) matches the independent mode without backtracking.
\w matches word characters.
\W matches non-word characters.
\s matches blank characters. Equivalent to [\t\n\r\f].
\S matches non-whitespace characters.
\d matches numbers. Equivalent to [0-9].
\D matches non-numerics.
\A matches the beginning of the string.
\Z matches the end of the string. If there is a newline, it only matches before the newline.
\z matches the end of the string.
\G matches the point at which the last match was completed.
\b matches the word boundary when outside the parentheses, matching the backspace key (0x08) when in parentheses.
\B matches non-word boundaries.
\n, \t, etc. matches line breaks, carriage returns, tabs, and more.
\1...\9 matches the nth grouping subexpression.
\10 matches the nth grouping subexpression if it has been matched. Otherwise points to the octal representation of the character encoding.

Regular Expression Instance

character

InstanceDescription
/ruby/match "ruby"
matches the rupee symbol. Ruby 1.9 and Ruby 1.8 support multiple characters.

character class

InstanceDescription
/[Rr]uby/ matches "Ruby" or "ruby"
/rub[ye]/ matches "ruby" or "rube"
/[aeiou]/ matches any lowercase vowel letter
/[0-9]/ matches any number, same as /[0123456789]/
/[a-z]/ matches any lowercase ASCII letter
/[A-Z]/ matches any uppercase ASCII letter
/[a-zA-Z0-9]/ matches any of the characters in parentheses
/[^aeiou]/ matches any non-lowercase vowel character
/[^0-9]/ matches any non-numeric character

Special character class

InstanceDescription
/./ match any character except a newline
/./m In line mode, it can also match newline characters
/\d/ matches a number, equivalent to /[0-9]/
/\D/ matches a non-number equivalent to /[^0-9]/
/\s/ matches a whitespace character, equivalent to /[ \t\r\n\f]/
/\S/ matches a non-blank character, equivalent to /[^ \t\r\n\f]/
/\w/ matches a word character, equivalent to /[A-Za-z0-9_]/
/\W/ matches a non-word character, equivalent to /[^A-Za-z0-9_]/

Repeat

InstanceDescription
/ruby?/ matches "rub" or "ruby". Among them, y is dispensable.
/ruby*/ matches "rub" plus 0 or more y.
/ruby+/ matches "rub" plus one or more y.
/\d{3}/ matches exactly 3 digits.
/\d{3,}/ matches 3 or more digits.
/\d{3,5}/ matches 3, 4 or 5 digits.
ModeDescription
^ matches the beginning of the line.
$ matches the end of the line.
. matches any single character except a newline. It can also match newline characters when using the m option.
[...] matches any single character in square brackets.
[^...] matches any single character that is not in square brackets.
re* matches the previous subexpression zero or more times.
re+ matches the previous subexpression one or more times.
re? matches the previous subexpression zero or one time.
re{ n} matches the previous subexpression n times.
re{ n,} matches the previous subexpression n times or more.
re{ n, m} matches the preceding subexpression at least n times up to m times.
a| b matches a or b.
(re) Group regular expressions and remember the matching text.
(?imx) Temporarily turns on the i, m, or x options in the regular expression. If it is in parentheses, it only affects the parts inside the parentheses.
(?-imx) Temporarily turn off the i, m, or x options in the regular expression. If it is in parentheses, it only affects the parts inside the parentheses.
(?: re) Groups regular expressions, but does not remember matching text.
(?imx: re)Turn on the i, m, or x options in parentheses.
(?-imx: re) Temporarily turn off the i, m, or x options in parentheses.
(?#...)Note.
(?= re)Use the mode to specify the location. No scope.
(?! re)Use the negative designation of the mode. No scope.
(?> re) matches the independent mode without backtracking.
\w matches word characters.
\W matches non-word characters.
\s matches blank characters. Equivalent to [\t\n\r\f].
\S matches non-whitespace characters.
\d matches numbers. Equivalent to [0-9].
\D matches non-numerics.
\A matches the beginning of the string.
\Z matches the end of the string. If there is a newline, it only matches before the newline.
\z matches the end of the string.
\G matches the point at which the last match was completed.
\b matches the word boundary when outside the parentheses, matching the backspace key (0x08) when in parentheses.
\B matches non-word boundaries.
\n, \t, etc. matches line breaks, carriage returns, tabs, and more.
\1...\9 matches the nth grouping subexpression.
\10 matches the nth grouping subexpression if it has been matched. Otherwise points to the octal representation of the character encoding.

Regular Expression Instance

character

InstanceDescription
/ruby/match "ruby"
matches the rupee symbol. Ruby 1.9 and Ruby 1.8 support multiple characters.

character class

InstanceDescription
/[Rr]uby/ matches "Ruby" or "ruby"
/rub[ye]/ matches "ruby" or "rube"
/[aeiou]/ matches any lowercase vowel letter
/[0-9]/ matches any number, same as /[0123456789]/
/[a-z]/ matches any lowercase ASCII letter
/[A-Z]/ matches any uppercase ASCII letter
/[a-zA-Z0-9]/ matches any of the characters in parentheses
/[^aeiou]/ matches any non-lowercase vowel character
/[^0-9]/ matches any non-numeric character

Special character class

InstanceDescription
/./ match any character except a newline
/./m In line mode, it can also match newline characters
/\d/ matches a number, equivalent to /[0-9]/
/\D/ matches a non-number equivalent to /[^0-9]/
/\s/ matches a whitespace character, equivalent to /[ \t\r\n\f]/
/\S/ matches a non-blank character, equivalent to /[^ \t\r\n\f]/
/\w/ matches a word character, equivalent to /[A-Za-z0-9_]/
/\W/ matches a non-word character, equivalent to /[^A-Za-z0-9_]/

Repeat

InstanceDescription
/ruby?/ matches "rub" or "ruby". Among them, y is dispensable.
/ruby*/ matches "rub" plus 0 or more y.
/ruby+/ matches "rub" plus one or more y.
/\d{3}/ matches exactly 3 digits.
/\d{3,}/ matches 3 or more digits.
/\d{3,5}/ matches 3, 4 or 5 digits.
...

Instance

#!/usr/bin/ruby # -*- coding: UTF-8 -*- phone = "138-3453-1111 #This is a phone number" # Delete Ruby's comments phone = phone.sub!(/#.*$/, "") puts "telephone number : #{phone}" # Remove characters other than numbers phone = phone.gsub!(/\D/, "") puts "telephone number : #{phone}"

Phone number

The above example runs the output as:

telephone number : 138-3453-1111
telephone number : 13834531111

Instance

#!/usr/bin/ruby # -*- coding: UTF-8 -*- text = "rails Yes rails, Ruby on Rails Very good Ruby framework" # Change all "rails" to "Rails" text.gsub!("rails", "Rails") # Change all the words "Rails" to initial capitalization text.gsub!(/\brails\b/, "Rails") puts "#{text}"

The above example runs the output as:

Rails Yes Rails, Ruby on Rails very good Ruby frame
...





welookups is optimized for learning.© welookups. 2018 - 2019 All Right Reserved and you agree to have read and accepted our term and condition.