In this article, we will discuss about Regular Expression and pattern matching in java.
Regular expressions
Regular expression is a string of characters that describes a character sequence.
Rules for Regular expressions
A regular expression can contain normal characters, character classes(sets of characters) and quantifiers.
Normal characters
– Normal characters are matched as is. For example, a pattern “Java” will only match input sequence of “Java”.
– Wildcard dot(.) matches any character. e.g. the pattern “.” will math characters “A”, “a” etc
Character classes
– A character class matches any character in the set. e.g. [ABCD] will match A, B, C or D.
– Inverted set matches any character apart from characters in the set. e.g, [^ABCD] will match any character other than A, B, C or D.
– A range can be specified using hyphen(-). e.g, [1-9] matches digits from 1 to 9
Quantifiers
Quantifiers determine how many times an expression is matched.
+ quantifier matches one or more characters.
* quantifier matches zero or more characters.
? quantifier matches zero or one character.
Pattern matching in Java
In Java, following classes support regular expression processing :
Pattern
Matcher
Pattern
It defines a regular expression.
Pattern class has no constructors. Instead, a pattern is created by calling the compile() factory method :
Pattern pattern = Pattern.compile("Java");
Matcher
Matcher is used to match the pattern against another sequence.
Matcher class has no constructors. Instead, a Matcher is created using matcher() factory method defined in Pattern class.
Matcher matcher = pattern.matcher("TopJavaTutorial");
Pattern Matching methods
boolean matches()
It returns true if the entire sequence matches the pattern, and false otherwise.
1 2 3 4 5 6 7 8 | Pattern pattern = Pattern.compile("Java"); Matcher matcher = pattern.matcher("TopJavaTutorial"); System.out.println(matcher.matches()); |
Output :
false
boolean find()
It returns true if the sequence contains a matching subsequence for the pattern, and false otherwise.
1 2 3 4 5 6 7 | Pattern pattern = Pattern.compile("Java"); Matcher matcher = pattern.matcher("TopJavaTutorial"); System.out.println(matcher.find()); |
Output :
true
String group()
This method can be used to obtain string containing last matching sequence.
start() and end()
start() returns the index of the current match in the sequence.
end() returns the index of character after the current match.
1 2 3 4 5 6 7 8 9 | Pattern pattern = Pattern.compile("T"); Matcher matcher = pattern.matcher("TopJavaTutorial"); while(matcher.find()){ System.out.println("T found at index " + matcher.start()); } |
Output :
T found at index 0
T found at index 7
String replaceAll()
This method can be used to replace all occurences of a matching sequence with another sequence.
Pattern Matching using Wildcard characters and quantifiers
As discussed, we can use following quantifiers :
+ quantifier matches one or more characters.
* quantifier matches zero or more characters.
? quantifier matches zero or one character.
If we are trying to find pattern with repetitions of a character like “A”, we can write the pattern as “A+”
Let’s see an example :
1 2 3 4 5 6 7 8 9 | Pattern pattern = Pattern.compile("A+"); Matcher matcher = pattern.matcher("AA AAA A"); while(matcher.find()){ System.out.println(matcher.group()); } |
Output :
AA
AAA
A
Similarly, if we are looking for repetition of any characters, we can use combination of . and + like this :
1 2 3 4 5 6 7 8 9 | Pattern pattern = Pattern.compile("t.+?a"); Matcher matcher = pattern.matcher("topjavatutorial"); while(matcher.find()){ System.out.println(matcher.group()); } |
Output :
topja
tutoria
Pattern Matching using Character sets
We can use character sets and ranges to match any sequence of letters.
For example, the following pattern matches any lowercase characters :
1 2 3 4 5 6 7 8 9 | Pattern pattern = Pattern.compile("[a-z]+"); Matcher matcher = pattern.matcher("top java tutorial"); while(matcher.find()){ System.out.println(matcher.group()); } |
Output :
top
java
tutorial
Similarly, if we need to match for both lowercase and uppercase characters, we can use the pattern:
“[a-zA-z]+”
Example :
1 2 3 4 5 6 7 8 9 | Pattern pattern = Pattern.compile("[a-zA-z]+"); Matcher matcher = pattern.matcher("Top Java Tutorial"); while(matcher.find()){ System.out.println(matcher.group()); } |
Output :
Top
Java
Tutorial
You may also like reading
© 2016 – 2018, www.topjavatutorial.com. All rights reserved. On republishing this post, you must provide link to original post