module ActiveSupport::Multibyte::Unicode

Constants

UNICODE_VERSION

The Unicode version that is supported by the implementation

Public Instance Methods

compose(codepoints) click to toggle source

Compose decomposed characters to the composed form.

# File lib/active_support/multibyte/unicode.rb, line 33
def compose(codepoints)
  codepoints.pack("U*").unicode_normalize(:nfc).codepoints
end
decompose(type, codepoints) click to toggle source

Decompose composed characters to the decomposed form.

# File lib/active_support/multibyte/unicode.rb, line 24
def decompose(type, codepoints)
  if type == :compatibility
    codepoints.pack("U*").unicode_normalize(:nfkd).codepoints
  else
    codepoints.pack("U*").unicode_normalize(:nfd).codepoints
  end
end
default_normalization_form() click to toggle source
# File lib/active_support/multibyte/unicode.rb, line 11
def default_normalization_form
  ActiveSupport::Deprecation.warn(
    "ActiveSupport::Multibyte::Unicode.default_normalization_form is deprecated and will be removed in Rails 7.0."
  )
end
default_normalization_form=(_) click to toggle source
# File lib/active_support/multibyte/unicode.rb, line 17
def default_normalization_form=(_)
  ActiveSupport::Deprecation.warn(
    "ActiveSupport::Multibyte::Unicode.default_normalization_form= is deprecated and will be removed in Rails 7.0."
  )
end
tidy_bytes(string, force = false) click to toggle source

Replaces all ISO-8859-1 or CP1252 characters by their UTF-8 equivalent resulting in a valid UTF-8 string.

Passing true will forcibly tidy all bytes, assuming that the string's encoding is entirely CP1252 or ISO-8859-1.

# File lib/active_support/multibyte/unicode.rb, line 44
def tidy_bytes(string, force = false)
  return string if string.empty? || string.ascii_only?
  return recode_windows1252_chars(string) if force
  string.scrub { |bad| recode_windows1252_chars(bad) }
end

Private Instance Methods

recode_windows1252_chars(string) click to toggle source
# File lib/active_support/multibyte/unicode.rb, line 77
def recode_windows1252_chars(string)
  string.encode(Encoding::UTF_8, Encoding::Windows_1252, invalid: :replace, undef: :replace)
end