Talk Proposal Submission
If you are interested in attending this talk at PyCon JP 2017, please use the social media share buttons below. We will consider the popularity of the proposals when making our selection.
talk
Why you should do text analysis with Python (even if you don't want to)(en)
Speakers
Bhargav Srinivasa Desikan
Audience level:
Novice
Category:
Big Data
Description
The explosion in Artificial Intelligence and Machine Learning is unprecedented now - and text analysis is likely the most easily accessible and understandable part of this. And with python, it is crazy easy to do this - python has been used as a parsing langauge forever, and with the rich set of text analysis tools, it works more than just well.
Objectives
Attendees will be able to now use the rich python NLP environment to parse and understand their textual data. python is an excellent scripting language and attendees will get to know the exact tricks used to clean and analyse textual data using python.
Abstract
The explosion in Artificial Intelligence and Machine Learning is unprecedented now - and text analysis is likely the most easily accessible and understandable part of this. And with python, it is crazy easy to do this - python has been used as a parsing langauge forever, and with the rich set of Natural Language Processing and Computational Linguistic tools, it's worth doing text analysis even if you don't want to.
The purpose of this talk is to convince the python community to do text analysis - and explain both the hows and the whys. Python has traditionally been a very good parsing language, aruguably replacing perl for all text file handling tasks. Reading files, regular expressions, writring to files, crawling on the web for textual data have all been standard ways to use python - and now with the Machine Learning and AI explosion - we have a great set of tools in python to understand all the textual data we can so easily play with.
I will be briefly talking aboubt the merits, de-merits and use-cases of the most popular text processing libraries. In particular, these will be spaCy, NLTK, gensim. I will also talk about how to use traditional Machine Learning libraries for text analysis, such as scikit-learn, Keras and TensorFlow.
Pre-processing is the one of the most important steps of Text Analysis, and I will talk more about this - after all, garbage in, garbage out!
The final part of the talk will be about where to get your data - and how to create your own textual data as well. You could analyse anything, from your own emails and whatsapp conversations to freely available British Parliament transcripts!