The Challenge: Managing Translations for Multiple Languages
We have some applications that requires a different set of messages for each language, stored in TypeScript files as key-value pairs. Each key corresponds to a specific user message or label, and each value is the translated text. A simple example for English messages english.ts
might look like this:
export const en = {
'common.greeting': 'Hello',
'common.greeting-{name}': 'Hello, {name}!',
'common.farewell': 'Goodbye',
'common.ask-name': 'What is your name?'
};
A similar spanish.ts
file will look like this, before we have imported our translations:
export const es = {
'common.greeting': 'Hello', // TODO: Translate
'common.greeting-{name}': 'Hello, {name}!', // TODO: Translate
'common.farewell': 'Goodbye', // TODO: Translate
'common.ask-name': 'What is your name?' // TODO: Translate
};
Our client provides new translations in an Excel file, where each row includes:
- A key that matches a TypeScript entry
- The English text (for reference)
- The translation for the target language
While the structure is straightforward, the volume of translations can be overwhelming. Copying each line from Excel into the TypeScript file is impractical and risks introducing mistakes, especially if keys or translations are missing.
The Solution: Python Script for Automated Translation Updates
To tackle this, we developed a Python script that processes the Excel file and updates the TypeScript files automatically. The script reads the translations from the Excel sheet and replaces corresponding entries in our content file. Here’s a breakdown of how our approach works.
Step 1: Setting Up the Excel File
Consider an Excel file with a sheet named after the wanted language, e.x. ‘Spanish’.
The sheet is structured with three essential columns:
- Key - The identifier used in our TypeScript file (e.g.,
common.greeting
). - English Text - A reference text, which helps in identifying any discrepancies or placeholder mismatches.
- Translation Text - The translated text for the target language. Named based on the language it is translated in. (e.g.,
Spanish text
)
Here’s a sample Excel structure:
This Excel is usually populated by our client, where we send over the translation keys and their English counterpart. The automation for that is a topic for another time, but here we assume we have already done that, and gotten this Excel fully translated and ready for import into our application.
Step 2: Reading and Processing the Excel Data and Typescript file
Using pandas
, we read the Excel file and filter out rows with empty keys or translations. This ensures we only process valid data.
import pandas as pd
import re
def import_translations(excel_file, ts_file, language_sheet):
# Load the specified language sheet from the Excel
# file to a pandas dataframe.
df = pd.read_excel(excel_file, sheet_name=language_sheet)
# Ensure the required columns (A, B, and C) are present
if df.shape[1] < 3:
print("Error: The Excel file does not contain the required columns.")
return
# Extract keys, English text (for reference), and translations
translation_data = df.iloc[:, [0, 1, 2]].dropna(
subset=[df.columns[0]]
) # Drop rows with no key
# Read the TypeScript file
with open(ts_file, "r", encoding="utf-8") as file:
ts_content = file.read()
Step 3: Find and replace the translation data in the TypeScript file
With the parsed Excel data in hand, we can clean up the data in translation_data
for empty translations, or mis-translated variable placeholders, logging appropriate messages to the console. The script then reads the TypeScript ts_content
file line-by-line, searching for each key and replacing it with the corresponding translation. Finally writing the substituted data back to the ts_content
file. (e.g., spanish.ts
) and logging a success message.
def import_translations(excel_file, ts_file, language_sheet):
...
# Process each translation
for _, row in translation_data.iterrows():
key, english_text, translation = row.iloc[0], row.iloc[1], row.iloc[2]
# Skip if the translation is empty
if pd.isna(translation) or translation.strip() == "":
print(f"Skipping '{key}': translation is empty.")
continue
# Check for placeholder mismatch between English and translated text
english_placeholders = find_placeholders(english_text)
translation_placeholders = find_placeholders(translation)
if english_placeholders != translation_placeholders:
print(
f"Error: Placeholder mismatch for key '{key}'. Skipping substitution. "
f"Expected '{english_placeholders}', found '{translation_placeholders}'."
)
continue
# Construct the regex pattern to match the key in the TypeScript file
pattern = rf"('{key}':\s*')[^']*(')(,?\s*(// TODO: Translate)?)"
replacement = rf"\1{translation}\2\3"
# Replace all occurrences of the key in the TypeScript file content
ts_content, count = re.subn(pattern, replacement, ts_content)
# If the translation was successful, remove any "// TODO: Translate" comment
if count > 0:
ts_content = re.sub(
rf"('{key}':\s*'{translation}'(?:,?))\s*// TODO: Translate",
rf"\1",
ts_content,
)
else:
print(f"Error: No match found for key '{key}' in {ts_file}")
# Write the updated content back to the TypeScript file
with open(ts_file, "w", encoding="utf-8") as file:
file.write(ts_content)
print(f"Translations for '{language_sheet}' updated successfully in {ts_file}.")
The above code snippet also includes a cleanup step, removing TODO tags if the translation was successful.
Small regex aside
The specific regex patterns we use here can be explained like this:
# The full pattern we will break down:
pattern = rf"('{key}':\\s*')[^']*(',?\s*)"
The first group matches the translation key, as it will be encased in single quotes and the also the part of the colon and possible spaces afterwards until # we hit the starting single quote for the translations text.
translation_key_group = rf"('{key}:\s*')"
# In other words, from the full row
'common.greeting': 'Hello',
# we would capture
'common.greeting': '
The second group we get only the text portion. This will match all the text until the next single quote, very specifically the translation text itself.
translated_text_matcher = rf"[^']*"
# From the same row as before:
'common.greeting': 'Hello',
# we capture
Hello
The last group simply matches the closing quote, comma and spacing until end of the line.
closing_group = rf"(',?\s*)"
We then construct the replacement text with the new translation with this regular expression.
replacement = rf"\1{translation}\2"
# And now we substitute the line if key is found in TypeScript content
ts_content = re.sub(pattern, replacement, ts_content)
# This would result in the row
'common.greeting': 'Hello',
# be rendered as
'common.greeting': 'Hola',
# in the final spanish.ts file
Step 4: Logging and Handling Errors Gracefully
To ensure a smooth process, the script outputs clear messages for each step. Errors and warnings are color-coded using colorama
for easy readability:
- Errors (e.g., if a key isn't found in the TypeScript file) are shown in red.
- Warnings (e.g., placeholder mismatches) are shown in yellow.
- Skips for empty translations are shown in cyan.
- Success messages are green, letting us know the update completed without issues.
Here's a sample log outputs:
Implementing this is as easy as add Fore.COLOR_NAME
before the text in the different print statements in the code.
e.g.
import pandas as pd
import re
# Add the new import
import colorama
# Initialize colorama
init(autoreset=True)
...
def import_translations(excel_file, ts_file, language_sheet):
...
if df.shape[1] < 3:
print(FORE.RED + "Error: The Excel file does not contain the required columns.")
return
...
if pd.isna(translation) or translation.strip() == "":
print(Fore.CYAN + f"Skipping '{key}': translation is empty.")
continue
...
print(Fore.GREEN + f"Translations for '{language_sheet}' updated successfully in {ts_file}.")
This feedback loop allows us to track each translation and make necessary adjustments before deploying the updates to production.
Benefits and Impact
With this automated workflow, we’ve seen significant improvements in our translation process:
- Time Savings: What used to take hours of manual work can now be completed in seconds.
- Accuracy: By automating placeholder checks and handling missing translations, we reduce errors such as blindly copying some translations to a wrong key, or paste in an erroneous variable that got translated.
- Scalability: As our client’s application adds support for more languages, our workflow scales effortlessly, handling hundreds of translations with ease.
Example Use Case
Let’s say our client provided us with updated Spanish translations in the Excel sheet. By making a callable script import_translations.py
using the above methods we can run it with a single command:
python import_translations.py path/to/translations.xlsx Spanish path/to/translations.ts
The script reads the Excel data, checks each translation, and updates spanish.ts
accordingly. Any placeholders that didn’t translate correctly, missing translations, or unmatched keys are flagged, helping us refine the file quickly. This process has allowed our team at Ur Solutions to focus on higher-value tasks while ensuring that every translation is correctly implemented in the application.
Conclusion
Automating translations isn’t just for saving time, it reduces boring work and let us focus on the more exiting parts of software development. I get a small dopamine rush whenever I see an hour of work be done in one second. This automation has proven invaluable in our ongoing projects, allowing us to support multilingual applications with minimal overhead.
At Ur Solutions, we’re always looking for ways to streamline processes and improve accuracy. If you’re managing translations for your applications without using external tools, consider building a similar pipeline. With the right tools and some clever automation, you can say goodbye to tedious copy-pasting and focus on what truly matters: building great software.