Creating a Single Node Hadoop - Spark VM for Developing / Learning
Before we go deeper into Data Science, let's break for a moment and deal with some basic infrastructure just to allow us to build a basic local hadoop environment for developing or learning purposes. First and foremost, you should have a Linux installed, up and running. Also, you should know your way on linux. Make sure, at least, you know how to use the terminal, you know the basic file structure, how to pack/unpack things in tar/bz2/gz etc, how to ssh to some machine, how to start a VM (in case you're not using your local machine to install things) etc etc. To make the long story short: you need to have a basic level of linux knowledge. If you've never used linux before, I strongly suggest that you stop whatever you're doing right know and go learn it, at least the basics of it. Dump your windows, get a linux distro, and start usingit for your daily life stuff. Yes, it's necessary. Maybe not so much right now, but it will be. ...