Final_Project_1 3 [PDF] | Documents Community Sharing

* The preview only shows a few pages of manuals at random. You can get the complete content by filling out the form below.

Description

Now all the protein intake data are in unit grams/capita/day, which makes it a little hard to do comparison between the different countries with different population. So I thought a better way to achieve a better comparison value is by calculating the percentage of different food categories intake. Even though Hive includes some built-in math operations that allows for mathematical calculations, I decided to utilize MapReduce for the next data set operation. MapReduce can be run on local mode during development phase - which will speed up the runtime and also makes it very easy to debug, as opposed to Hive’s slow processing time. MapReduce also gives the ability to store the resulting dataset directly into HDFS, which also makes it easier for me to store the resulting dataset. In the mapper class, I loaded in the “area” column of the dataset as the main key, the rest of the protein intake values are loaded as LongWritables (the following code snippet shows a quick preview):

Then data is written utilizing the usual “context.write( )” method : The reducer class includes the mathematical calculation for the percentage, at a high level, for each row, I will first add up the sum of protein intake values across all 23 different food types, and then each individual value will be divided by the sum and multiplied by 100 to get the percentage of protein daily intake value:

And again, as usual, the result is written via “context.write( )” :

In the main driver, I initially set the MapReduce mode to “local” during debugging stages:

Instead of manually sending the data file into HDFS via command line, I decided to create input/output directory, and then load the local data file (from the Hive implementation above) and store resulting data files into HDFS,

Final_Project_1 3

Description

Similar documents

Tarea #3 Corte 3

TAREA 3 MOD 3

Tarea 3

Tarea 3.

Material 3

MODUL 3

MKA_APPinvertor_DZ_06_1544024844 (3)

tarea 3

MATERIA 3

Final_Project_1 3

Discurso. (3)

Materi 3