Friday 2:05 p.m.–2:35 p.m.
Room 201 #pyconjp_201Why you should do text analysis with Python (even if you don't want to)
Bhargav Srinivasa Desikan
- Audience level:
- Novice
- Category:
- Big Data
- Video:
- https://youtu.be/lVLN1IIYmu8
Description
The explosion in Artificial Intelligence and Machine Learning is unprecedented now - and text analysis is likely the most easily accessible and understandable part of this. And with python, it is crazy easy to do this - python has been used as a parsing langauge forever, and with the rich set of text analysis tools, it works more than just well.
Abstract
The explosion in Artificial Intelligence and Machine Learning is unprecedented now - and text analysis is likely the most easily accessible and understandable part of this. And with python, it is crazy easy to do this - python has been used as a parsing langauge forever, and with the rich set of Natural Language Processing and Computational Linguistic tools, it's worth doing text analysis even if you don't want to.
The purpose of this talk is to convince the python community to do text analysis - and explain both the hows and the whys. Python has traditionally been a very good parsing language, aruguably replacing perl for all text file handling tasks. Reading files, regular expressions, writring to files, crawling on the web for textual data have all been standard ways to use python - and now with the Machine Learning and AI explosion - we have a great set of tools in python to understand all the textual data we can so easily play with.
I will be briefly talking aboubt the merits, de-merits and use-cases of the most popular text processing libraries. In particular, these will be spaCy, NLTK, gensim. I will also talk about how to use traditional Machine Learning libraries for text analysis, such as scikit-learn, Keras and TensorFlow.
Pre-processing is the one of the most important steps of Text Analysis, and I will talk more about this - after all, garbage in, garbage out!
The final part of the talk will be about where to get your data - and how to create your own textual data as well. You could analyse anything, from your own emails and whatsapp conversations to freely available British Parliament transcripts!