Big Data Analytics

These blog posts are related to the assignments and our final project in pyspark during the class of Big Data Analytics(CSE545), Stony Brook University. All of these codes are in jupyter notebook and pyspark.

Similar Water Regions 05 Dec 2017

Find coordinates of areas in the entire world that are similar to each other in surface water trends using the Surface water Satellite dataset. Dataset

Satellite Image Analysis 25 Nov 2017

Find similar regions in satellite images by implementing singular value decomposition and locality sensitive hashing from scratch. Dataset

Blog Corpus Industry Mention 15 Oct 2017

A spark implementation to count the number of times industries have been mentioned in the Blog Corpus dataset in a month of a year. Dataset