Friday 11:30 a.m.–noon
Room 201 #pyconjp_201Using machine learning to try and predict taxi availability
Hari Allamraju
- Audience level:
- Intermediate
- Category:
- Big Data
Description
In this talk we will use the taxi availability data from Singapore to learn how we can predict taxi availability with machine learning, and also discuss how such information might be used to help consumers and taxi companies
Abstract
Taxi's nowadays are equipped with devices which an provide their location very accurately. These can be used to get a snapshot of taxi availability at any point of time. The Singapore government provides an open API which can give us a snapshot of the taxi availability in the form of the taxi locations across Singapore. This is very useful to get a current snapshot of the data.
By querying the API at periodic intervals we can build a picture of how the availability changes across Singapore. These changes will include various variables like drivers moving around looking for riders, taxis getting hired, taxis dropping off people etc. If we analyze the data and apply machine learning to these data snapshots taken over a few days, we can try and predict the taxi availability at any location for a given time of the day.
The information which we learn from such an analysis can be combined with other data sets like weather, rider demand, any news events etc and understand or predict how people will move across the city. This can be very useful for consumers, taxi companies and even government.
Such systems probably already exist at the major taxi and ride sharing companies. So this talk will focus on the following aspects to enable the audience to learn more about these systems -
Data collection
Processing data to a format we can use
Identifying the parameters that we can learn/analyze from the data
Provide a few example of the analysis
Present the results
A brief discussion on how the data can be used in conjunction with other data sets
At the end of the talk the audience can use the slides and information as a reference in case they want to perform such analysis on their own or use it for learning. We can also learn from the comments of any audience members who have worked on such systems and who may want to add some information during the Q&A section.