Presented at the Teaching and Language Corpora (TaLC) Conference in Lancaster on July 23, 2014. Based on collaborative work with the FLAX Language Project (Shaoqun Wu and Ian Witten) and the Language Centre at Queen Mary University of London (Martin Barge, William Tweddle, Saima Sherazi).
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes with Language Corpora
1. Bridging informal MOOCs & formal EAP
programmes with language corpora
Alannah Fitzgerald, Shaoqun Wu, Ian Witten, Martin Barge,
William Tweddle, Saima Sherazi
https://www.flickr.com/photos/library_of_congress/8725417555
/
2. Today’s TaLC Session...
• Development of Tools and Language Corpora
– Design-Based Research with the FLAX Project
• Openness in Corpus-Based Tools, Resources &
Practices
• New & Old Contexts of Learning, Teaching &
Research with Corpus-Based Approaches
– Bridging Formal & Informal Higher Education with
Open Do-It-Yourself ESAP Language Collections
3. Who are we in this flax research &
Development collaboration?
4. FLAX Language at Waikato University
http://flax.nzdl.org FLAX image by permission of non-commercial reuse by Jane Galloway
5. FLAX Language Project at the
Greenstone Digital Library Lab,
Waikato University NZ
Professor Ian Witten
FLAX Project Lead
Dr Shaoqun Wu
FLAX Project Lead Researcher & Developer
6. Data Mining with Weka MOOC
https://www.youtube.com/user/WekaMOOC/videos?sort=p&flow=grid&view=0
12. Openness in Mainstream MOOCs?
http://www.michaelbransonsmith.net/blog/2012/12/19/day-of-the-mooc-now-animated/
13. The End of the University
As We Know It
“The future looks like this: Access to college-level
education will be free for everyone; the residential
college campus will become largely obsolete; tens of
thousands of professors will lose their jobs; the
bachelor’s degree will become increasingly irrelevant;
and ten years from now Harvard will enroll ten million
students.” (Harden, 2013)
http://www.the-american-interest.com/article.cfm?piece=1352
14. The Education Apocalypse:
#opened13 Keynote
“Where in the stories we’re telling about the future of education are
we seeing salvation? Why would we locate that in technology and
not in humans, for example? Why would we locate that in markets
and not in communities? What happens when we embrace a
narrative about the end-times — about education crisis and
education apocalypse? Who’s poised to take advantage of this crisis
narrative? Why would we believe a gospel according to artificial
intelligence, or according to Harvard Business School [Christensen’s
Disruptive Innovation theory], or according to Techcrunch...?”
(Watters, 2013)
http://hackeducation.com/2013/11/07/the-education-apocalypse/
15. Current MOOC Language Issues
• Mainstream MOOCs (Coursera, edX, Udacity) are predominantly in
the English Language
– MOOC learners are not registered as language learners
• Impact on retention and course completion
• Crowdsourcing and funding for commercial translations of MOOCs
is currently limited
– Translations of lectures only do not assist with assessment
requirements in e.g. English-medium MOOCs
• Receptive versus productive language needs
• Mainstream MOOCs do not (in most cases) license content openly
as Open Educational Resources (OER)
– Open licensing with Creative Commons is vital for developing
derivative resources to support language learning
– Building linguistic support into MOOC learning platforms? e.g. a
combination of translation and corpus-based tools?
• Online learning offers a compelling case for corpus-based approaches
17. Be Free to Do Whatever You Want!
• Open Resources for ESAP
Soup Dragons:
– Building & Sharing Open ESAP
Corpora to Promote DIY
Corpus-Based Approaches
– Developing Automated
Interactivity into ESAP
Corpora
– Developing ESAP Course Book
and Lesson Plan Derivatives
– Researching and Developing
ESAP Corpora & Derivatives
– Researching and Developing
Corpus Tools e.g. Interfaces
http://en.wikipedia.org/wiki/The_Soup_Dragons
19. Google-esque Interface Designs
Designed for the non-expert corpus user, namely:
learners, teachers, subject academics, instructional
designers and language resource developers.
23. FLAX Across Platforms
• FLAX Website flax.nzdl.org for hosting open online
language collections
• Building directly onto the Web with OER
• FLAX multilingual open-source software for
downloading onto your PC
• For offline use
• Building collections out of sight using All Rights Reserved
content
• FLAX for MOODLE plug-in
• FLAX for MOOC Platforms?
• FLAX in conjunction with translation technologies?
24. Training Videos for FLAX on YouTube
http://www.youtube.com/watch?v=fysDzYjbhh0
26. Collaboration with Subject Specialists
“In the emerging academic literacies approach involving cooperation
between subject specialists and writing teachers, the aim is to help the
students develop metacognitive awareness of the roles and functions
of writing in that discipline, to enable them to stand back from it and
observe how it functions, and then to help them gradually participate
in the genres, where genre is understood as a constellation of actions
rather than a list of formal features.” (Breeze, 2012)
27. Earth’s Virology Professor with
Coursera MOOCs
“Natural science might be characterized as a discipline of discovery,
identifying and describing entities that had not been previously
considered. As a result, natural science employs a large set of highly
technical words, like dextrinoid, electrophoresis, and phallotoxins.
Most of these words do not have commonplace synonyms, because
they refer to entities, characteristics, or concepts that are not normally
discussed in everyday conversation.” (Biber, 2006)
28. Virology Language Collection in FLAX
Type of media in the FLAX Virology
Collection
Number of items in the FLAX Virology
Collection
Podcast audio transcripts (This Week in
Virology)
130
YouTube video transcripts (2013 virology
course at Columbia, also in Coursera)
110
Academic blog posts (Virology Blog) 540
Open Access research articles (relevant to
virology course and divided into paper
sections)
40
32. Domain-specific Collocations
We focus on lexical collocations with noun-based
structures because they are the most salient and
important patterns in topic-specific text:
•verb + noun e.g. detect virus particles
•noun + noun e.g. tobacco mosaic virus
•adjective + noun e.g. negative strand virus
•noun + of + noun e.g. genome of the virus
33. Lexical Bundles
“Lexical bundles” are multi-word sequences with
distinctive syntactic patterns and discourse functions that
are commonly used in academic prose (Biber & Barbieri,
2007; Biber et al, 2003, 2004).
Typical patterns in the virology MOOC lectures include:
•noun phrase + of e.g. a DNA copy of
•prepositional phrase + of e.g. at the end of
•it + verb/adjective phrase e.g. it turns out that
•be + noun/adjective phrase e.g. is an example of
•verb phrase + that e.g. you can see that
34. ESAP Law Collections in FLAX at QMUL
Type of media in the FLAX
Law Collections
Number and source of items in the FLAX
Law Collections
Podcast audio files & transcripts
(OpenSpires)
10-15 Lectures (Oxford Law Faculty & the Centre
for Socio-Legal Studies)
MOOC lecture transcripts &
videos (streamed via YouTube &
Vimeo)
4 MOOC Collections: Copyright Law
(Harvard/edX), English Common Law (Uni. of
London/Coursera), Age of Globalization (Texas at
Austin/edX), Environmental Law & Politics
(OpenYale)
Student PhD thesis writing & Pre-sessional
for Law ESAP essays
British Law Report Corpus (BLaRC)
(Marin, 2012)
10-20 EThoS Theses at the British Library;
20+ Essays from QMUL Law Pre-sessional
8-million word corpus derived from freely
available content on the BAILII website
Open Access research articles
(relevant to QMUL Law Pre- and
In-Sessional language courses)
40 Articles (DOAJ - Directory of Open Access
Journals)
43. Key Data Sets Will Consist Of:
• Online survey data
– MOOC learners for evaluation of collections
– Language Teaching professionals on perceptions of OER
• Offline data for evaluation of collections and course
book derivatives of the collections for ESAP
– Survey and Think-Aloud Protocols to evaluate the FLAX
Language System
– Student texts from Law students (Queen Mary University
of London).
• Interview and focus-group data (f2f and online via
Skype)
– With stakeholders (language teachers, academics, MOOC
providers) involved in the development of the academic
language collections used in this research.
46. References
• Biber, D., Conrad, S., & Cortes, V. (2003). Lexical bundles in speech and
writing: an initial taxonomy. In A. Wilson, P. Rayson, & T. McEnery (Eds.),
Corpus linguistics by the lune: A festschrift for Geoffrey Leech (pp. 71–92).
Frankfurt/Main: Peter Lang.
• Biber, D., Conrad, S., & Cortes, V. (2004). If you look at . . .: lexical bundles
in university teaching and textbooks. Applied Linguistics, 25, 371–405.
Biber, D. (2006). University Language, A corpus-based study of spoken and
written registers. John Benjamins, Amsterdam.
• Biber, D., Barbieri F. (2007). Lexical bundles in university spoken and
written registers. English for Specific Purpose, 26, 263–286.
• Breeze, R. (2012). Rethinking Academic Writing Pedagogy for the
European University. Rodopi, Amsterdam.
• Harden, N. (2013). The end of the university as we know it. The American
Interest. Retrieved from http://www.the-american-interest.
com/article.cfm?piece=1352
• Milne, D. & Witten, I.H. (2013). An open-source toolkit for mining
Wikipedia. Artificial Intelligence, 194, 222-239.
• Watters, A. (2013). The Education Apocalypse #opened13. Retrieved from
http://www.hackeducation.com/2013/11/07/the-education-apocalypse/
47. Thank You
FLAX Language Project http://flax.nzdl.org/
Shaoqun Wu: shaoqun@waikato.ac.nz / Ian Witten: ihw@cs.waikato.ac.nz
OER Research Hub http://oerresearchhub.org/
Alannah Fitzgerald: a_fitzg@education.concordia.ca; @AlannahFitz;
www.alannahfitzgerald.org TOETOE Blog; Slideshare:
http://www.slideshare.net/AlannahOpenEd/
The Language Centre – Queen Mary University of London http://language-centre.
sllf.qmul.ac.uk/
Martin Barge m.i.barge@qmul.ac.uk
William Tweddle w.tweddle@qmul.ac.uk
Saima Sherazi s.n.sherazi@qmul.ac.uk
Editor's Notes
Open resources that you can do whatever you want with: corpus building and online activities based on the corpus with open source software as in the FLAX project; developing course book derivatives from the open resources; researching the effectiveness of the corpus for future iterations of collections building and interface designs.
Less than half of all Open Access journals are published using Creative Commons licenses so this is where Open Educational Resources and Open Source Software have more in common than they do with OA. But there are OA journals we can use and most of which are published under the most flexible Creative Commons licenses e.g. CC-BY with only a few being the most restrictive e.g. CC-ND. Depending on the field there will be less or more OA journals. There are not many OA journals for Law but there are many Openly-licensed government papers in the field of Law. We will look at adding samples of these also in future.
Being able to show demo corpora like the one we are building in FLAX online, enables us to explain to e.g. the British Library, what our intended uses are for theses writing for NC Educational and Research Development purposes for ESAP.