SEO Strings, Regular Expressions and Template Literals
Strings are undoubtedly one of the most important data types in programming language.
Strings are in almost every programming language and to learn effective use of them is basic necessity of each developers. To effectively work with Strings, developer needs to understand Regular Expressions because it has capacity to manipulate strings. With ECMAScript 6 Strings and Regular Expressions now have new features and those missing functionalities that other programming languages have.
In this post I will list below few of new Features/Methods of Strings from ES6:
- Identify Substrings
UTF-16 Code Points
charAt() method, were based on these 16-bit code units. Although, 16 bits used to be enough to contain any character, but now ES6 introduced new character set by Unicode.
The first 216 code points in UTF-16 are represented as single 16-bit code units. This range is called the Basic Multilingual Plane (BMP). Everything after that is considered to be in one of the supplementary planes, where the code points can not be represented in just 16-bits. To solve this problem UTF-16 introduced surrogate pairs in which a single code point is represented by two 16-bit code units. That means any single character in a string can be either one code unit for BMP characters, giving a total of 16 bits, or two units for supplementary plane characters, giving a total of 32 bits.
Meaning, all string operations work on 16-bit code unit in ECMAScript 5, you may get unexpected results from UTF-16 code strings:
var text = "𠮷"; console.log(text.length); // 2 console.log(/^.$/.test(text)); // false console.log(text.charAt(0)); // "" console.log(text.charAt(1)); // "" console.log(text.charCodeAt(0)); // 55362 console.log(text.charCodeAt(1));
The single Unicode character
2, when it should be
- When we try with regular expression to match a single character fails because it thinks that there are two characters.
charAt()method is unable to return a valid character string, because neither set of 16 bits corresponds to a printable character.
charCodeAt()method also can’t identify the character properly and it returns the appropriate 16-bit number for each code unit.