Dwarves
Memo
Type ESC to close search bar

Database Designs For Multilingual Apps

This story comes from the necessity of multi-language support across our applications. Dwarves Foundation handles a lot of international clients and there is always some level of concern for supporting multiple languages for certain apps. This is a concern not just from our clients, but also from us, which motivated our research for multilingual support.

Introduction

In today’s globalized world, many applications need to support multiple languages to reach a broader audience. Managing translations in a database is a critical aspect of building a multilingual application. However, designing a multilingual database can be challenging and involves several factors, such as character encoding, language-specific data storage requirements, and translation tables.

Data modeling multilingual apps

In this article, we’ll explore three common solutions for designing translation tables:

Solution 1: Column-Based Approach

The column-based approach is the simplest solution for managing translations in a database. For each column in a table, there is a corresponding column for translations in other languages. For example, if a column is in English, there will be another column that stores its translations in different languages such as Spanish, French, and so on.

Here’s an example of how a table using this approach might look:

Retrieving Translation

To query data you would need to use a COALESCE function to retrieve the translation in the desired language, with a fallback to the default column if the translation is not available.

For example, to retrieve the French translation from the above table:

// we wanted to get French translation
SELECT
	COALESCE(title_fr, title) AS title,
	COALESCE(description_fr, description) AS title
FROM
	"example_table"

Pros and cons

Pros

Cons

The column-based approach may be useful for smaller projects with a limited number of languages to support. Still, it may not be the best approach for larger projects with more languages and complex data.

Solution 2: Column JSON-based approach

In the column JSON-based approach, a single column is used to store all translations for the other columns of the table by language. This approach reduces the number of columns needed compared to the column-based approach. The value of the column is a JSON object that contains translation data for each language.

For example, if you have a table with columns for “title” and “description,” you can use a single column named translations to store the translations in JSON format. The JSON object will have a key for each language, and each key will contain the translated column values for that language:

Retrieving Translation

To retrieve data, you need to use specific functions to extract data from the **translations** column.

// in this example we assume you use Postgres
// mysql, sqlserver might have slighly syntax diffrent

// return translation in translation column 
// we wanted to return all translations
SELECT
	id,
	title,
	translations
FROM
	"table";

// we only want to return a specific language
SELECT
	id,
	title,
	translations -> "vi" AS translation
FROM
	"table";

// omit the translation field
SELECT
	id,
	translations -> 'vi' -> 'title' AS title,
	translations -> 'vi' -> 'description' AS description
FROM
	"table";

Pros and cons

Pros

Cons

The column JSON-based approach is particularly useful for applications that are expected to support multiple languages and require flexibility in managing the translations.

Solution 3: Translation table approach

The translation table approach involves creating a separate table for storing translations of various text values in different languages. The translations are stored in key-value pairs, with the key representing the original text value, and the value representing the translated text in a specific language.

Here’s an example of how a translation table might look:

Retrieving Translation

To retrieve translations using the translation table approach, you would typically use SQL queries with JOIN statements to combine the relevant data from multiple tables.

SELECT
	title_trans.trans_value AS "title",
	description_trans.trans_value AS "description",
FROM
	"table_a"
	LEFT JOIN "translations" AS title_trans ON "table_a"."title" = description_trans.trans_key
		AND lang = 'vi'
	LEFT JOIN "translations" AS description_trans ON "table_a"."title" = description_trans.trans_key
		AND lang = 'vi'

Pros and Cons

Pros

Cons

The translation table approach is suitable for applications of any size that require support for multiple languages. It is especially useful for large applications where data duplication can become a problem with other approaches. However, it may not be the most efficient approach for small applications as it involves more complex queries and potentially slower performance due to the use of joins.

Conclusion

In conclusion, designing a multilingual database involves several moving parts that need to be taken into account, such as character encoding, language-specific data storage requirements, and translation tables. The selection of an approach depends on the specific requirements of the application and the expected data volume. Small to medium-sized applications may use a column-based or column JSON-based approach, while larger applications may benefit from a translation table-based approach.

In summary, the column-based approach is simple and easy to implement but not scalable, the column JSON-based approach reduces the number of columns needed, but can become difficult to manage, and the translation table approach is scalable and flexible but more complex to implement. By understanding the pros and cons of each approach, developers can make informed decisions about which solution best fits their needs.