Machine Translation for Identifiers in Python Programs

Ren-yuan Lyu, Che-Ning Liu

Audience level:
Novice
Category:
Education

Description

This is a demonstration for a two-year research project aimed to translating Python programs into traditional Chinese in order to help those who feel interested in learning computer programming without English proficiency. The idea could also be useful for those countries where English is not an official language in elementary or secondary schools.

Abstract

Introduction ==== In this project, an effort was tried to translate a Python program from English into the other natural languages for educational purpose. Starting from the task definition of this project, ending in the discussion of the performance of translation, we will present the process of this project step by step at the conference. Task Definition ==== The task looks as that in [fig.01], where the left column is the input, the middle column is the output, and the right column show their common running results. The input is the source code of an original Python program completely in English. The desired output is the automatically translated program in almost all Chinese identifiers, such that it looks like a Chinese Python program. [fig.01] Task Definition ![](https://dl.dropboxusercontent.com/u/33089565/_pyconjp2016/01.png ) Block Diagram of the System ==== The whole system can be divided into the following “blocks”,and we will provide some description in more detail at the conference. [fig.02] Block Diagram of the System ![](https://dl.dropboxusercontent.com/u/33089565/_pyconjp2016/02.png) Procedure of Automatic Translation ==== [fig.201] ![]( https://dl.dropboxusercontent.com/u/33089565/_pyconjp2016/201.png ) A Translation Result ==== Finally, the task of translation from a English program into an Chinese program was done! Here is a good example. ![][fig.207] (https://dl.dropboxusercontent.com/u/33089565/_pyconjp2016/207.png ) Performance analysis ==== Accordingly, we found that in an ordinary program, the coverage percentage of keywords is about 14%, and the identifiers occupy about 86% of the total. Furthermore, 62% of identifiers could be translated in this initial study of the project. [fig.401] the distribution of Keywords and Identifiers in a typical program ![](http://dl.dropboxusercontent.com/u/33089565/_pyconjp2016/05.png) [fig.402] the percentage of identifiers successfully translated ![](http://dl.dropboxusercontent.com/u/33089565/_pyconjp2016/06.png)
  • このエントリーをはてなブックマークに追加