Cadu Ribeiro

Cadu Ribeiro

Coffee addicted Software Engineer

Accepting Emojis on a Rails app with MySQL

- 3 mins

In the past weeks at work, we faced a problem in our application. A user tried to express himself with an emoji in the description field. We were not expecting it then you can imagine what happened. 💥 in production 😱.

emojis_everywhere

We received an alert in the team’s chat.

emojis_everywhere

Going through the error details, we can see that MySQL raised an error.

invalid_statement

And looking at the request, we can see the emoji in the body attribute:

body

Why did this happen? Is not Emoji a unicode character supported by the UTF8? Yes, it is. But some of them uses 4-bytes to store their data, and if we look at the UTF8 charset support at MySQL’s oficial doc, we can see that it can only accept 3-bytes.

It’s a different approach from Postgres’ UTF8. In Postgres charset table we see that UTF8 can store up to 4-bytes, meaning that Postgres already accepts Emojis by default.

How to store Emojis on MySQL database and avoid the Incorrect String value error?

Let’s check the MySQL’s doc again. We can see that in charsets support there is one item called utf8mb4. It’s a UTF8 with 4-bytes support. We should use it instead the default 3-bytes only.

I created a simple scaffold application for demonstration. Let’s use the model Comment with 2 properties (body and name).

If we try to save the comment with an emoji on the body, it will raise an exception.

form exception

Let’s generate a migration to convert this table and the columns to utf8mb4

$ bin/rails g migration change_comments_to_utf8mb4

and add the following content to the migration:

class ChangeCommentsToUtf8mb4 < ActiveRecord::Migration
  def up
    execute "ALTER TABLE comments CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin"
    execute "ALTER TABLE comments MODIFY name VARCHAR(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin" execute "ALTER TABLE comments MODIFY body TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin"
  end
  def down
    execute "ALTER TABLE comments CONVERT TO CHARACTER SET utf8 COLLATE utf8_bin"
    execute "ALTER TABLE comments MODIFY name VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin"
    execute "ALTER TABLE comments MODIFY body TEXT CHARACTER SET utf8 COLLATE utf8_bin"
  end
end

See that we set the column name with VARCHAR(191). This is because MySQL’s max key length for index on InnoDB engine is 767 bytes. With utf8 (3-bytes), we can store VARCHAR with a maximum of 255 chars (255 chars * 3 bytes = 765 bytes), but with utf8mb4 we can store the maximum of 191 chars (191 chars * 4 bytes = 764 bytes). If you want to store more bytes on the index, please look at InnoDB large prefix.

$ bin/rake db:migrate

We need to change the database.yml to set the encoding to utf8mb4. So, open the config/database.yml and change the line with

encoding: utf8

to

encoding: utf8mb4

Restart the server and now we can save emoji in our comment 😎.

success

If you are creating a new project, I highly recommend to start with utf8mb4 to avoid these issues in the future and eliminate the necessity of a migration for all tables or simply use Postgres instead ♥️

Cheers 🍻

comments powered by Disqus
rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora