Dataframe manipulation
A Dataframe can be manipulated by using either a SQL syntax as specification (Spark-SQL) or by using manipulation functions applied to the dataframe object itself. Data Manipulation using Functions It's a method more suitable for those who are more used to coding than to writing queries. The way you do data manipulation is simple: given a dataframe, apply transformation functions to it to get what you need. As an example, let's suppose you have a structured data that corresponds to a table of divisions of an imaginary IT company divisions = [ Row ( id = '1' , name = 'Data Science' ), Row ( id = '2' , name = 'Big Data' ), Row ( id = '3' , name = 'Artificial Inteligence' ), Row ( id = '4' , name = 'Devops' ) ] Let's also have a list of employees for that company Employee = Row ( "name" , "email" ) employees = [ Employee ( "John Robbins" , '...