The Suginoki Treebank is a JFL/JSL learner Japanese parsed corpus based on written tasks by L2 learners collected from Akita International University (AIU) during Fall 2018. Highlights include:
The name Suginoki derives from the symbolic tree of Akita Prefecture where Akita International University is located.
The Suginoki Treebank is associated with a powerful user interface that enables search using virtually any aspect of the annotation. Results of specific searches can be downloaded in the form of annotated data.
The Suginoki Treebank follows the parse annotation methods of the The Kainoki Treebank (Kainoki, 2022), amounting to a full morpho-syntactic analysis of the language data.
Annotation for the correction of learner errors has been given subjectively for reference purposes without any rigid standard. Nevertheless, the following annotations are used consistently to signal changes from original structures depending on the correction.
The Suginoki Treebank is made up of data from 26 texts written by short-term exchange international students enrolled in a Japanese language course during Fall 2018 at AIU.
Each text file contains two short essays and three definitions of concepts. The written texts were produced by following three prompts:
Read the instruction and write an essay (about 600 characters). Input must be made within the time limit of 60 minutes, but you can save the draft and have time to think of contents or search for relevant information. You can use the Internet but you cannot copy and paste the information itself.
In our daily life, we eat fast food and slow food (or homemade food that you enjoy at home slowly). Comparing them, write your opinion about ‘diet’ with about 600 characters by explaining pros and cons of each food.
Following the instruction, write an essay (about 800 characters) in Japanese. The time limit of this essay is 60 minutes. You cannot exit and save the draft. Once you have started, you must continue writing until the end.
Read the following information and write your opinion with about 800 characters in Japanese.
**************
Today, the Internet became available freely all over the world.
Some people say, “We no longer need newspapers or magazines because we can see the news on the Internet”.
In contrast, there are people who say, “We still need newspapers and magazines even from now on”.
What do you think? Please write your opinion.
Write your original definitions and conditions by following the instructions.
The essay prompts were chosen to assist comparison with the data of two existing corpora for learner Japanese, that collectively offer considerably more data than the current 26 texts of The Suginoki Treebank. Essay 1 follows the prompt used for the International Corpus of Japanese as a Second Language (I-JAS: https://chunagon.ninjal.ac.jp/static/ijas/about.html). Essay 2 follows the prompt used for the Database of Japanese Opinion Essays Written by College Students in Japan, Korea, and Taiwan (http://www.tufs.ac.jp/ts/personal/ijuin/terms.html).
Participants for the essay data collection were recruited from visits to five courses of different levels offered from AIU's Japanese Language Program during Fall 2018. Groups of five students from four courses (JPL 300, 305, 307, 506) and a group of six students from one course (JPL402) agreed to participate in this project. The following face sheet shows background information for participants.
Table 1: Face sheet for the essay data
ID | Learning period | JLPT | Native languages | Countries |
---|---|---|---|---|
How long have you studied Japanese? | Japanese Language Proficiency Test | What language is used at home? | Where is your permanent address? | |
n300_a | 1 year | none | Chinese | Taiwan |
n300_b | 2 years | none | Finnish | Finland |
n300_c | 2 years | none | German | Germany |
n300_d | 7 years | none | Spanish/English | USA |
n300_e | 2 years | none | Romanian | Romania |
n305_a | 2 years | none | English | UK |
n305_b | 2 years | none | Chinese | Taiwan |
n305_c | 2 years | none | Lithuanian | Lithuania |
n305_d | 4 years | none | English | USA |
n305_e | 2 years | none | German | Germany |
n307_a | 2 years | N2 | Chinese | Taiwan |
n307_b | 3 years | none | Russian | Russia |
n307_c | 3 years/7 years | N3 | Korean | Korea |
n307_d | 3 years/11 years | none | Finnish | Finland |
n307_e | 5 years | none | English | New Zealand |
n402_a | 2 years | none | English | UK |
n402_b | 4 years | N3 | Chinese | Taiwan |
n402_c | 3 years | N3 | German | Germany |
n402_d | 2 years | N3 | English | USA (12 years), China (5 years) |
n402_e | 4 years | N2 | Thai | Thailand |
n402_f | 3 years | none | Russian | Russia |
n506_a | 3 years | none | Slovak | Czech |
n506_b | 12 years | N1 | English | Singapore |
n506_c | 2 years | N2 | Korean | Korea |
n506_d | 23 years | none | Dutch | Netherlands |
n506_e | 3 years | N1 | Chinese | Taiwan |
When recruiting, there was an exaggeration of the need for “native speakers of English” to participate, but as Native languages in the face sheet suggest, the participants actually speak various languages. However, all participants were fluent speakers of English who had satisfied a requirement for admission to AIU where the primary language for tuition is English.
The numbers on the ID of participants in the face sheet indicate course codes that reflect course levels. The following table shows the level and textbook for each course. More detailed information about Japanese language courses at AIU can be seen on the webpage for the Japanese Language Program (https://web.aiu.ac.jp/en/academic/japanese-language-courses/).
Table 2: Japanese Language Courses at AIU
Courses | Levels | Textbooks |
---|---|---|
JPL300 | Intermediate-low | An Integrated Approach to Intermediate Japanese (Revised Edition)『中級の日本語[改訂版]』(L1 - L4) |
JPL305 | Intermediate-mid1 | An Integrated Approach to Intermediate Japanese (Revised Edition)『中級の日本語[改訂版]』(L5 - L8) |
JPL307 | Intermediate-mid2 | An Integrated Approach to Intermediate Japanese (Revised Edition)『中級の日本語[改訂版]』(L9 - L13) |
JPL402 | Higher Intermediate | Authentic Japanese: Progressing from Intermediate to Advanced (New Edition)『新中級から上級への日本語』(L1 - L5) |
JPL506 | Advanced | 『文藝春秋オピニオン2018年の論点100』 |
Presentations of research results using the The Suginoki Treebank should include a citation taking the general form of the example below (with appropriate modifications depending on the date of access):
Horiuchi, Hitoshi and Alastair Butler (2022) “The Suginoki Treebank – a parsed corpus of JFL/JSL learner Japanese” https://jltrees.github.io (accessed 9 January 2022).
This work is licensed under a Creative Commons Attribution 4.0 International License.