FAQ about Regular expressions with examples

regex
Regular expressions are both textual data processing method and language for building masks. Mask is a set of symbols and special characters that represent some pattern, which is used to catch segments of text for further work (replacing, extracting, checking conditions).
Regular expressions can be used with many popular programming languages: C#, javascript, pyton, php and so on.

Note: rules of building masks are similar in almost all cases.

How to use regular expression on C#?

To set up the regular expression tool, three major steps should be performed:

  1. Create pattern string
  2. Create exemplar of Regex class
  3. Process text by means of Regex
...
   string [email protected]"\sHello{3}";  //mask for catching "hello" with three O in the end and one space in left side
...

Note: pattern or mask is rule of processing text. You can try any mask using online tester.

...
   Regex reg = new Regex(pattern);  //Regular expression
   
   string text="Hi, John. Hellooo, Mary. Hello, Gary.";  // textual data to analize

   MatchCollection matches = reg.Matches(text);  //returns collection of Matches. In our case it matches " Hellooo"
...

It is possible to call a match by index from matches. Match class has some usefull properties:

  • Value – string that matches pattern
  • Index – position of the first character of captured string in the original string
  • Length – number of character in captured string
...
   string firstmatch=matches[0].Value; //firstmatch equals " Hellooo"
...

How to create list of substrings from text with Regex?

...
   MatchCollection matches = reg.Matches(text);  //match collection from text

   List resultlist = matches.Select(m => m.Value).ToList();  //using Linq

   string[] resultarray= new string[matches.Count];  //not using Linq
   for(int i=0; i<matches.Count; i++)
   {
      resultarray[i] = matches[i].Value;
   }
...

How to ignore letter case in regular expressions?

...
   Regex regex = new Regex(pattern, RegexOptions.IgnoreCase); //ignores letter case

   Regex regex2 = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.RightToLeft); //ignores letter case and process from right to left
...

Note: it is possible to set more RegexOptions (RegexOptions.IgnoreCase | RegexOptions.RightToLeft | RegexOptions.Compiled).

How to do replacing with Regex?

...
   //replaces "Helooo" with "Howdy"
   string newtext = Regex.Replace(text, pattern, "Howdy");
...

How to set timeout to Regex?

...
   Regex regEx = new Regex(pattern, RegexOptions.None, TimeSpan.FromSeconds(10));  //stops processing after 10 sec
...

How to make Regex work faster?

Dealing with huge amount of textual data, you will stumble upon a problem of execution speed loss. Setting RegexOptions.Compiled will make Regex perform faster.

...
   Regex regEx = new Regex(pattern, RegexOptions.Compiled);  //takes a bit more time for compilation, but works faster
...

Pattern for extracting emails from text

The following mask is able to catch ordinary emails (e.g. [email protected]), complex emails ([email protected]) and protected from spam emails (e.g. email[at]email.com, [email protected] dot com):

...
   string emailpattern =  @"([a-zA-Z0-9_\-\.]+)(@|(\s*\[\s*at\s*\]\s*)|(\s*" + "\"" + @"\s*(at)\s*" + "\"" + @"\s*))([a-zA-Z0-9_\-\.]+)((\.)|(\s+dot\s+)|(\s*\[\s*dot\s*\]\s*)|(\s*" + "\"" + @"\s*(dot)\s*" + "\"" + @"\s*))([a-zA-Z]{2,5})";
...

Patterns for extracting links from page

...
   string tagPat = @"?<\s*a[^>]*>(.*?)?<\s*/\s*a\s*>";  //pattern for A tags 

   string hrefPat = "href *= *(\"(?<url>.+?)\")"; //pattern with group url for href attribute
...

Groups in regular expressions

You can organize result of Regex matches into one or more groups:

... 
   string hrefPat = "href *= *(\"(?<url>.+?)\")"; //pattern with group url for href attribute
   
   string text = "<a href=\"http://somedomain\"> Domain </a>";

   Regex regex = new Regex(hrefPat, RegexOptions.IgnoreCase);

   string result=regex.Match(text).Groups["url"].Value;  //result equals http://somedomain
...

Leave a Reply