Go: Unicode & Strings

Go: Unicode & Strings

Unicode is a character encoding standard that assigns a unique code point to each character and symbol used in written language, including letters, numbers, punctuation marks, and emojis. Rust supports Unicode natively and encourages developers to use it to write programs that can handle text data from all over the world.

Character

In Rust, the char type represents a Unicode scalar value, which is a code point from the Unicode character set. Rust strings are made up of these char types. char types in Rust are 4 bytes in size and can represent any Unicode scalar value. Rust strings use UTF-8 encoding, which is widely used and provides backward compatibility with ASCII encoding.


Rune

In Go, rune is an alias for the int32 type. It represents a Unicode code point and is used to represent a single Unicode character.

Since Go represents strings as sequences of runes, the rune type is often used in Go programs to perform operations on individual Unicode characters.

Rune literals are represented using single quotes, like character literals in C or Java. For example:

var r rune = 'A'
fmt.Println(r) // Output: 65

Here, we declare a variable r of type rune and initialize it with the unicode code point of the character A using a rune literal.

Go provides a number of built-in functions for working with runes, such as len() to get the length of a string in runes, unicode.IsDigit() to check if a rune represents a digit, and unicode.ToTitle() to convert a rune to its uppercase equivalent.

str := "Hello, 世界"
// Output: 10 (10 runes in the string)
fmt.Println(len(str))          
fmt.Println(unicode.IsDigit('3'))// Output: true
fmt.Println(unicode.ToTitle('a'))// Output: A

Here, we use the len() function to get the length of the string in runes (not bytes), unicode.IsDigit() to check if a rune represents a digit, and unicode.ToTitle() to convert a rune to its uppercase equivalent.

In summary, rune is a data type in Go that represents a Unicode code point and is used to manipulate individual characters in a string.


Strings

In Go, the string data type represents a sequence of bytes that encodes a Unicode character sequence. Strings in Go are immutable, which means once a string is created, it cannot be modified. However, you can create a new string by combining one or more existing strings.

Here is an example of creating a string literal in Go:

greeting := "Hello, world!"

In Go, strings are represented using double quotes, and you can also use backticks to create raw string literals that may include backslashes and line breaks.

rawString := `This is a raw string
that includes line breaks
and backslashes: \n`

As mentioned earlier, since strings are just sequences of bytes, you can treat a string like a slice or an array of bytes in Go. You can access individual characters in a string using the square bracket notation []. In Go, strings are represented using the UTF-8 encoding, which is a variable-length encoding, so it's important to keep in mind that a single character may take up multiple bytes in memory.

Here is an example of accessing the individual bytes of a string:

str := "Hello, 世界"
fmt.Println(str[0])     // Output: 72 (the ASCII code of 'H')
fmt.Println(str[7:10])  // Output: 世 (a slice of the string containing the characters at index 7, 8, and 9)

In summary, the string data type in Go represents a sequence of bytes that encode a UTF-8 character sequence. Strings in Go are immutable, but you can perform operations on them to create new strings. You can access individual characters in a string using the square bracket notation, but keep in mind that a single character may take up multiple bytes in memory due to the variable-length UTF-8 encoding.


Functions

Go has a built-in strings package that provides a rich set of functions for manipulating strings. Here are some of the most important functions in the strings package along with examples:

  1. strings.Replace: This function replaces all instances of one substring with another substring within a given string.
package main

import (
    "fmt"
    "strings"
)

func main() {
    str := "The quick brown fox jumps over the lazy dog."
    newStr := strings.Replace(str, "fox", "panda", -1)
    fmt.Println(newStr) // Output: The quick brown panda jumps over the lazy dog.
}
  1. strings.Split: This function splits a string into substrings using a specified separator.
package main

import (
    "fmt"
    "strings"
)

func main() {
    str := "apple,banana,pear,orange"
    arr := strings.Split(str, ",")
    fmt.Println(arr) // Output: [apple banana pear orange]
}
  1. strings.Contains: This function returns a boolean indicating whether a given string contains a specified substring.
package main

import (
    "fmt"
    "strings"
)

func main() {
    str := "The quick brown fox jumps over the lazy dog."
    fmt.Println(strings.Contains(str, "fox")) // Output: true
    fmt.Println(strings.Contains(str, "elephant")) // Output: false
}
  1. strings.ToLower and strings.ToUpper: These functions return a new string where all characters in the original string are converted to lowercase or uppercase, respectively.
package main

import (
    "fmt"
    "strings"
)

func main() {
    str := "The quick brown Fox jumps Over the lazy Dog."
    fmt.Println(strings.ToLower(str)) // Output: the quick brown fox jumps over the lazy dog.
    fmt.Println(strings.ToUpper(str)) // Output: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
}
  1. strings.Trim: This function removes specified characters from the beginning and end of a string.
package main

import (
    "fmt"
    "strings"
)

func main() {
    str := "  apple  "
    newStr := strings.Trim(str, " ")
    fmt.Printf("[%s]\n", newStr) // Output: [apple]
}
  1. strings.Join: This function concatenates a slice of strings into a single string using a specified separator.
package main

import (
    "fmt"
    "strings"
)

func main() {
    arr := []string{"apple", "banana", "pear", "orange"}
    str := strings.Join(arr, ", ")
    fmt.Println(str) // Output: apple, banana, pear, orange
}

These are just a few examples of the many useful string manipulation functions provided by the strings package in Go.


Concatenation

String concatenation is the process of combining two or more strings to create a single string. In Go, there are multiple ways to concatenate strings:

  1. Using the + operator:
package main

import "fmt"

func main() {
  str1 := "Hello"
  str2 := " World"
  result := str1 + str2
  fmt.Println(result) // Output: Hello World
}
  1. Using the fmt.Sprintf function:
package main

import "fmt"

func main() {
  str1 := "Hello"
  str2 := " World"
  result := fmt.Sprintf("%s%s", str1, str2)
  fmt.Println(result) // Output: Hello World
}
  1. Using the strings.Join function:
package main

import (
  "fmt"
  "strings"
)

func main() {
  strArr := []string{"Hello", " World"}
  result := strings.Join(strArr, "")
  fmt.Println(result) // Output: Hello World
}

In all three examples, we get the same output. However, using the + operator can be inefficient when concatenating large strings because it can result in the creation of many intermediate string objects. If you need to concatenate many strings, it's better to use the strings.Builder type or the bytes.Buffer type for performance reasons.


Comparison

One important problem in Logic is the comparison between elements of a set. In case of strings, consider the universe of all characters and all strings. What is a relation between two characters and relation between to strings? Understaniding the fundamentals of comparison will help you later to unserstand consitionals and sorting algorithms.

Rune comparison

In Go, a rune is a numeric representation of a Unicode code point. Comparing two runes in Go is simple, as we can directly use the == and != operators to compare them as we would do with integers.

For example:

package main

import "fmt"

func main() {
    r1 := 'a'
    r2 := 'b'
    if r1 == r2 {
        fmt.Println("Runes are equal")
    } else {
        fmt.Println("Runes are not equal")
    }
}

In this example, we are comparing two runes r1 and r2. If r1 and r2 are equal, the program will print "Runes are equal". If they are not equal, the program will print "Runes are not equal".

It's important to note that we should always enclose the rune in single quotes (') while comparing it. Comparing runes directly with a string will result in comparing their numeric values, which could lead to incorrect results.

For example:

package main

import "fmt"

func main() {
    r := 'a'
    s := "a"
    if r == s {
        fmt.Println("Runes and string are equal")
    } else {
        fmt.Println("Runes and string are not equal")
    }
}

In this example, we are trying to compare a rune r with a string s. This comparison will not give the expected result because we are comparing a rune with a string.

String comparison

In Go, we can compare two strings using the == and != operators. These operators compare the contents of the strings rather than their memory addresses.

Here's an example:

package main

import "fmt"

func main() {
    str1 := "Hello, World!"
    str2 := "Hello, Gophers!"
    if str1 == str2 {
        fmt.Println("Strings are equal")
    } else {
        fmt.Println("Strings are not equal")
    }
}

In this example, we are comparing two strings str1 and str2 using the == operator. If str1 and str2 contain the same characters in the same order, the program will print "Strings are equal". Otherwise, it will print "Strings are not equal".

It's important to note that string comparison in Go is case-sensitive. So, two strings differing only in their case will be considered different. However, if we want to perform case-insensitive comparison, we can convert both strings to a common case before comparing them.

Here's an example:

package main

import (
    "fmt"
    "strings"
)

func main() {
    str1 := "Hello, World!"
    str2 := "hELLo, wORLD!"
    if strings.EqualFold(str1, str2) {
        fmt.Println("Strings are equal")
    } else {
        fmt.Println("Strings are not equal")
    }
}

In this example, we are using the EqualFold method from the strings package to compare two strings str1 and str2. This method performs a case-insensitive comparison of two strings. If str1 and str2 contain the same characters in the same order, ignoring their case, the program will print "Strings are equal". Otherwise, it will print "Strings are not equal".


Coercion

In Go, we can convert a string to a number using the strconv package, and we can convert a number to a string using the fmt package.

Here's an example of converting a string to a number using the strconv package:

package main

import (
    "fmt"
    "strconv"
)

func main() {
    str := "123"
    num, err := strconv.Atoi(str)
    if err != nil {
        fmt.Println("Invalid number")
    } else {
        fmt.Println("The number is", num)
    }
}

In this example, we are using the Atoi function from the strconv package to convert a string str to an integer num. The Atoi function returns two values: the converted number and an error. If the conversion is successful, the converted number is stored in the num variable, and the error is set to nil. Otherwise, the error value will be set, and we can handle it accordingly.

Here's an example of converting a number to a string using the fmt package:

package main

import "fmt"

func main() {
    num := 123
    str := fmt.Sprintf("%d", num)
    fmt.Println("The string is", str)
}

In this example, we are using the Sprintf function from the fmt package to convert a number num to a string str. The %d verb in the format string specifies that we want to format an integer value. The Sprintf function returns a formatted string that we can assign to a variable or use directly.

It's important to note that if the number is not within the range of the target type, the conversion will result in an error. Additionally, if the string contains characters that cannot be converted to a number, the conversion will also result in an error. So, it's highly recommended to handle errors properly when performing string-to-number and number-to-string conversions in Go.


Interpolation

In Go, we can perform string interpolation using the fmt.Sprintf function. String interpolation is a method of creating a new string by inserting one or more variables into a string template.

Here's an example of string interpolation using fmt.Sprintf:

package main

import "fmt"

func main() {
    name := "Alice"
    age := 30
    fmtMsg := fmt.Sprintf("%s is %d years old.", name, age)
    fmt.Println(fmtMsg)
}

In this example, we are using the fmt.Sprintf function to create a new string called fmtMsg. The first argument to fmt.Sprintf is a string template that contains placeholders like %s and %d for the name and age variables respectively. The second and subsequent arguments are the values that will be inserted into the placeholders.

After calling fmt.Sprintf, the resulting string is stored in fmtMsg, which we can then print out to the console using fmt.Println.

It's important to note that string interpolation using fmt.Sprintf is not the same as concatenating strings using the + operator. String interpolation provides a more flexible and readable way to create a new string that includes variable substitutions.

Additionally, Go also supports another way of string interpolation known as raw string literals, which allow for multi-line strings and the inclusion of special characters without escaping. However, this does not involve variable substitution like fmt.Sprintf, so it may not be applicable for all use cases.


Conclusion

I have asked ChatGPT several relevant questions about string representation in Go. I hope you find this article useful. If you find errors, comment below. Thanks for reading.


Learn and prosper. 🍀🖖🏼