BIG DATA - ECOMMERCE ¶

Purchases and products analysis

my boat — *Online eCommerce store analysis. Image credit: Talk Business.*

Data Source: https://www.kaggle.com/mkechinov/ecommerce-behavior-data-from-multi-category-store

Spark is an absolute necessity for file of this size. It is built to handle large files where if we only had base python and pandas, we'd not only run into memory issues, but the analysis would take a very long time. Typically, when observations exceed reach 1 million and up to 10 million, then pandas starts to really struggle. For this analysis, we are processing 233 million observations and 30 GBs of uncompressed data.

Each of the file sizes and observation count per month is visible below.

Given the gamut of changes and intricacies within spark and PySpark specifically, there was a lot of research and troubleshooting needed to get this analysis completed. One of the major changes to spark in recent years was the unification and accessibility. In order to use spark, we first need to initiate the 'SparkSession' which then we use to gain access into other API's, like SQL Context or Spark Context. In older version, these were disparate parts. These API's are visible below.

DEPENDENCIES & INITIATE SPARK¶

Before starting the analysis process we need to do some setup programming. This requires that we import the dependencies like pandas, numpy, seaborn, and spark. The color scheme and the initiation of the spark session is visible below.

None

+-----+
|hello|
+-----+
|spark|
+-----+

None

'spark'

'spark'

[('spark.driver.host', '172.31.29.10'),
 ('spark.rdd.compress', 'True'),
 ('spark.app.id', 'local-1590371855740'),
 ('spark.serializer.objectStreamReset', '100'),
 ('spark.master', 'local[*]'),
 ('spark.driver.port', '50698'),
 ('spark.executor.id', 'driver'),
 ('spark.submit.deployMode', 'client'),
 ('spark.ui.showConsoleProgress', 'true'),
 ('spark.app.name', 'pyspark-shell')]

'sqlcontext'

<pyspark.sql.context.SQLContext at 0x1cf08983108>

INITIATE ANALYSIS¶

IMPORT FILE¶

The data that I imported for this analysis is in four separate files. I decided to also just import them into their separate dataframes so that I could review the information on a per month basis. Each import is visible in this section.

When looking at the header information for each data file, it is important to note that each row is representative of an event by a user. This is recorded with a timestamp and the event classification. The possibilities include view, cart, or purchase. For this analysis, I am mainly concerned with the purchase event types.

+--------------------+----------+----------+-------------------+--------------------+--------+-------+---------+--------------------+
|          event_time|event_type|product_id|        category_id|       category_code|   brand|  price|  user_id|        user_session|
+--------------------+----------+----------+-------------------+--------------------+--------+-------+---------+--------------------+
|2019-10-01 00:00:...|      view|  44600062|2103807459595387724|                null|shiseido|  35.79|541312140|72d76fde-8bb3-4e0...|
|2019-10-01 00:00:...|      view|   3900821|2053013552326770905|appliances.enviro...|    aqua|   33.2|554748717|9333dfbd-b87a-470...|
|2019-10-01 00:00:...|      view|  17200506|2053013559792632471|furniture.living_...|    null|  543.1|519107250|566511c2-e2e3-422...|
|2019-10-01 00:00:...|      view|   1307067|2053013558920217191|  computers.notebook|  lenovo| 251.74|550050854|7c90fc70-0e80-459...|
|2019-10-01 00:00:...|      view|   1004237|2053013555631882655|electronics.smart...|   apple|1081.98|535871217|c6bd7419-2748-4c5...|
+--------------------+----------+----------+-------------------+--------------------+--------+-------+---------+--------------------+
only showing top 5 rows

+--------------------+----------+----------+-------------------+--------------------+------+------+---------+--------------------+
|          event_time|event_type|product_id|        category_id|       category_code| brand| price|  user_id|        user_session|
+--------------------+----------+----------+-------------------+--------------------+------+------+---------+--------------------+
|2019-11-01 00:00:...|      view|   1003461|2053013555631882655|electronics.smart...|xiaomi|489.07|520088904|4d3b30da-a5e4-49d...|
|2019-11-01 00:00:...|      view|   5000088|2053013566100866035|appliances.sewing...|janome|293.65|530496790|8e5f4f83-366c-4f7...|
|2019-11-01 00:00:...|      view|  17302664|2053013553853497655|                null| creed| 28.31|561587266|755422e7-9040-477...|
|2019-11-01 00:00:...|      view|   3601530|2053013563810775923|appliances.kitche...|    lg|712.87|518085591|3bfb58cd-7892-48c...|
|2019-11-01 00:00:...|      view|   1004775|2053013555631882655|electronics.smart...|xiaomi|183.27|558856683|313628f1-68b8-460...|
+--------------------+----------+----------+-------------------+--------------------+------+------+---------+--------------------+
only showing top 5 rows

+--------------------+----------+----------+-------------------+--------------------+-----+-------+---------+--------------------+
|          event_time|event_type|product_id|        category_id|       category_code|brand|  price|  user_id|        user_session|
+--------------------+----------+----------+-------------------+--------------------+-----+-------+---------+--------------------+
|2019-12-01 00:00:...|      view|   1005105|2232732093077520756|construction.tool...|apple|1302.48|556695836|ca5eefc5-11f9-450...|
|2019-12-01 00:00:...|      view|  22700068|2232732091643068746|                null|force| 102.96|577702456|de33debe-c7bf-44e...|
|2019-12-01 00:00:...|      view|   2402273|2232732100769874463|appliances.person...|bosch| 313.52|539453785|5ee185a7-0689-4a3...|
|2019-12-01 00:00:...|  purchase|  26400248|2053013553056579841|computers.periphe...| null| 132.31|535135317|61792a26-672f-4e6...|
|2019-12-01 00:00:...|      view|  20100164|2232732110089618156|    apparel.trousers| nika| 101.68|517987650|906c6ca8-ff5c-419...|
+--------------------+----------+----------+-------------------+--------------------+-----+-------+---------+--------------------+
only showing top 5 rows

+--------------------+----------+----------+-------------------+--------------------+-------+-------+---------+--------------------+
|          event_time|event_type|product_id|        category_id|       category_code|  brand|  price|  user_id|        user_session|
+--------------------+----------+----------+-------------------+--------------------+-------+-------+---------+--------------------+
|2020-01-01 00:00:...|      view|   1005073|2232732093077520756|construction.tool...|samsung|1130.02|519698804|69b5d72f-fd6e-4fe...|
|2020-01-01 00:00:...|      view|   1005192|2232732093077520756|construction.tool...|  meizu| 205.67|527767423|7f596032-ccbf-464...|
|2020-01-01 00:00:...|      view| 100063693|2053013552427434207|       apparel.shirt| turtle| 136.43|519046195|d1e2f343-84bb-49b...|
|2020-01-01 00:00:...|      view|   5100816|2232732103831716449|       apparel.shoes| xiaomi|  29.95|518269232|0444841c-38ef-410...|
|2020-01-01 00:00:...|      view| 100014325|2232732103294845523|apparel.shoes.ste...|  intel|  167.2|587748686|31b7d4cf-dfac-489...|
+--------------------+----------+----------+-------------------+--------------------+-------+-------+---------+--------------------+
only showing top 5 rows

TABLE REVIEW¶

In this section I quickly review the total record count per each month. The output is order by month (oct-nov-dec-jan). The range is between 42 million to 67 million, with October with the least, and December with the most total events.

COUNTING RECORDS AND COLUMNS¶

Total columns are: 9, 
Total records are: 42448764 
The column names are: 
               0
0     event_time
1     event_type
2     product_id
3    category_id
4  category_code
5          brand
6          price
7        user_id
8   user_session

Total columns are: 9, 
Total records are: 67501979 
The column names are: 
               0
0     event_time
1     event_type
2     product_id
3    category_id
4  category_code
5          brand
6          price
7        user_id
8   user_session

Total columns are: 9, 
Total records are: 67542878 
The column names are: 
               0
0     event_time
1     event_type
2     product_id
3    category_id
4  category_code
5          brand
6          price
7        user_id
8   user_session

Total columns are: 9, 
Total records are: 55967041 
The column names are: 
               0
0     event_time
1     event_type
2     product_id
3    category_id
4  category_code
5          brand
6          price
7        user_id
8   user_session

COUNTING NULL AND NA VALUES¶

- - - - -  NULL VALUE COUNTS - - - - -  NULL VALUE PERCENTS - - - - -

None

- - - - -  NULL VALUE COUNTS - - - - -  NULL VALUE PERCENTS - - - - -

None

- - - - -  NULL VALUE COUNTS - - - - -  NULL VALUE PERCENTS - - - - -

None

- - - - -  NULL VALUE COUNTS - - - - -  NULL VALUE PERCENTS - - - - -

None

DATA MANIPULATIONS¶

TIME AND CATEGORY¶

Completed

[('event_time', 'timestamp'),
 ('event_type', 'string'),
 ('product_id', 'string'),
 ('category_id', 'string'),
 ('category_code', 'string'),
 ('brand', 'string'),
 ('price', 'double'),
 ('user_id', 'int'),
 ('user_session', 'string'),
 ('event_totd', 'string'),
 ('event_date', 'date'),
 ('event_hour', 'int'),
 ('event_minute', 'int'),
 ('Time1440Mins', 'int'),
 ('dotw_num', 'int'),
 ('dotw_day', 'string'),
 ('dotm_num', 'int'),
 ('woty_num', 'int'),
 ('primary_cat', 'string'),
 ('PurchaseYN', 'int')]

Completed

[('event_time', 'timestamp'),
 ('event_type', 'string'),
 ('product_id', 'string'),
 ('category_id', 'string'),
 ('category_code', 'string'),
 ('brand', 'string'),
 ('price', 'double'),
 ('user_id', 'int'),
 ('user_session', 'string'),
 ('event_totd', 'string'),
 ('event_date', 'date'),
 ('event_hour', 'int'),
 ('event_minute', 'int'),
 ('Time1440Mins', 'int'),
 ('dotw_num', 'int'),
 ('dotw_day', 'string'),
 ('dotm_num', 'int'),
 ('woty_num', 'int'),
 ('primary_cat', 'string'),
 ('PurchaseYN', 'int')]

Completed

[('event_time', 'timestamp'),
 ('event_type', 'string'),
 ('product_id', 'string'),
 ('category_id', 'string'),
 ('category_code', 'string'),
 ('brand', 'string'),
 ('price', 'double'),
 ('user_id', 'int'),
 ('user_session', 'string'),
 ('event_totd', 'string'),
 ('event_date', 'date'),
 ('event_hour', 'int'),
 ('event_minute', 'int'),
 ('Time1440Mins', 'int'),
 ('dotw_num', 'int'),
 ('dotw_day', 'string'),
 ('dotm_num', 'int'),
 ('woty_num', 'int'),
 ('primary_cat', 'string'),
 ('PurchaseYN', 'int')]

Completed

[('event_time', 'timestamp'),
 ('event_type', 'string'),
 ('product_id', 'string'),
 ('category_id', 'string'),
 ('category_code', 'string'),
 ('brand', 'string'),
 ('price', 'double'),
 ('user_id', 'int'),
 ('user_session', 'string'),
 ('event_totd', 'string'),
 ('event_date', 'date'),
 ('event_hour', 'int'),
 ('event_minute', 'int'),
 ('Time1440Mins', 'int'),
 ('dotw_num', 'int'),
 ('dotw_day', 'string'),
 ('dotm_num', 'int'),
 ('woty_num', 'int'),
 ('primary_cat', 'string'),
 ('PurchaseYN', 'int')]

SET SQL TABLE VIEWS¶

EDA SECTION I¶

REVIEWING PRIMARY CATEGORIES AND PRICING¶

+------------+--------+
|    Category|AvgPrice|
+------------+--------+
|construction|  408.88|
|   computers|  306.88|
|  appliances|  302.31|
| electronics|  297.29|
|        auto|  224.95|
|   furniture|  179.65|
|        null|   168.6|
|     apparel|  158.39|
|        kids|  138.06|
|       sport|  136.73|
|  stationery|  110.41|
| accessories|  105.56|
|country_yard|   82.32|
|    medicine|   30.44|
+------------+--------+

+------------+--------+
|    Category|MinPrice|
+------------+--------+
|    medicine|     0.0|
|   computers|     0.0|
|        auto|     0.0|
|        null|     0.0|
|  stationery|     0.0|
|       sport|     0.0|
|     apparel|     0.0|
|  appliances|     0.0|
|country_yard|     0.0|
|   furniture|     0.0|
| accessories|     0.0|
|        kids|     0.0|
| electronics|     0.0|
|construction|     0.0|
+------------+--------+

+------------+--------+
|    Category|MaxPrice|
+------------+--------+
|        null| 2574.07|
| electronics| 2574.07|
|   computers| 2574.04|
|     apparel| 2574.04|
|  appliances| 2574.04|
|   furniture| 2574.04|
|       sport| 2573.81|
| accessories| 2570.47|
|construction| 2562.49|
|        kids| 2548.33|
|        auto| 2522.56|
|country_yard| 2044.25|
|  stationery|  942.88|
|    medicine|   486.5|
+------------+--------+

+------------+--------+
|    Category|AvgPrice|
+------------+--------+
|construction|  423.54|
| electronics|  324.17|
|  appliances|  311.61|
|   computers|  285.48|
|        auto|  221.61|
|   furniture|  217.16|
|     apparel|  177.58|
|        null|  162.24|
|        kids|  152.36|
|       sport|  147.91|
| accessories|  107.07|
|  stationery|   73.97|
|country_yard|   54.09|
|    medicine|   41.14|
+------------+--------+

+------------+--------+
|    Category|MinPrice|
+------------+--------+
|    medicine|     0.0|
|   computers|     0.0|
|        auto|     0.0|
|        null|     0.0|
|  stationery|     0.0|
|       sport|     0.0|
|     apparel|     0.0|
|  appliances|     0.0|
|country_yard|     0.0|
|   furniture|     0.0|
| accessories|     0.0|
|        kids|     0.0|
| electronics|     0.0|
|construction|     0.0|
+------------+--------+

+------------+--------+
|    Category|MaxPrice|
+------------+--------+
|        null| 2574.07|
|   furniture| 2574.07|
| electronics| 2574.07|
|   computers| 2574.04|
|  appliances| 2574.04|
|       sport| 2573.81|
|     apparel| 2573.81|
| accessories| 2569.57|
|        kids| 2548.33|
|construction| 2548.07|
|        auto| 2522.56|
|country_yard| 2098.76|
|  stationery|  924.35|
|    medicine|   486.5|
+------------+--------+

REVIEWING BRANDS AND PRICING¶

+--------------------+--------+
|               Brand|AvgPrice|
+--------------------+--------+
|           pinarello| 2573.81|
|         nordictrack| 2522.59|
|       dreammachines| 2472.15|
|             kessler|  2372.0|
|           yjfitness| 2344.98|
|              senspa| 2252.46|
|          louiserard| 1951.46|
|           gardenlux|  1919.4|
|                oris| 1877.19|
|              jumper| 1853.33|
|                 dhz| 1838.26|
|               helix| 1835.22|
|definitivetechnology| 1770.07|
|     generalelectric| 1756.91|
|                rado|  1753.0|
|               gitzo| 1749.28|
|            tagheuer| 1743.26|
|                 blg| 1724.63|
|     tianzhanfitness| 1712.79|
|    bdsmaschinengmbh| 1698.47|
+--------------------+--------+

+------------------+--------+
|             Brand|MinPrice|
+------------------+--------+
|              null|     0.0|
|          farmstay|    0.77|
|     shoesrepublic|    0.77|
|        tramontina|    0.77|
|              ekel|    0.77|
|            ipower|     0.8|
|            febest|     0.8|
|            gerber|    0.83|
|             heinz|    0.84|
|           nutrend|    0.85|
|        prof-press|    0.85|
|            ritmix|    0.86|
|          purederm|    0.88|
|             nivea|    0.89|
|   narodnyerecepty|     0.9|
|studioprofessional|     0.9|
|      fitokosmetik|     0.9|
|kapousprofessional|     0.9|
|               dfz|     0.9|
|            sweety|    0.91|
+------------------+--------+

+---------+--------+
|    Brand|MaxPrice|
+---------+--------+
|     rado| 2574.07|
|pinskdrev| 2574.07|
|     null| 2574.07|
|   garmin| 2574.07|
|       lg| 2574.04|
|    apple| 2574.04|
|     sony| 2574.04|
| kurzweil| 2574.04|
|     dexp| 2574.04|
|   yamaha| 2574.04|
|     acer| 2574.04|
|  samsung| 2574.04|
|     teka| 2574.04|
|    haier| 2574.04|
|     asus| 2574.04|
|      msi| 2574.04|
|    wacom| 2574.04|
|    alser| 2574.04|
|    giant| 2573.81|
|pinarello| 2573.81|
+---------+--------+

+--------------+--------+
|         Brand|AvgPrice|
+--------------+--------+
|     pinarello| 2573.81|
|  florencemode| 2570.98|
|        active| 2522.59|
|        gravas| 2496.85|
|       kessler|  2372.0|
|     yjfitness| 2344.98|
|        unifur| 2290.92|
|waterrowerclub| 2280.62|
|       rongtai| 2273.48|
|        senspa| 2250.33|
|   nordictrack| 2139.48|
| dreammachines| 2137.33|
|  costadinafur| 2135.38|
|        pajaro| 2110.74|
|         fokos| 2028.06|
|           ego| 2015.77|
|    louiserard| 1965.12|
|     gardenlux| 1933.01|
|       manacas| 1910.57|
|        albino| 1896.49|
+--------------+--------+

+------------+--------+
|       Brand|MinPrice|
+------------+--------+
|        null|     0.0|
|     philips|    0.77|
|      ritmix|    0.77|
|    farmstay|    0.77|
|badlandgames|    0.77|
|       cikoo|    0.77|
|      meizer|    0.77|
|  tramontina|    0.77|
|        ekel|    0.77|
|      polese|    0.79|
|      sweety|    0.79|
|       barer|    0.79|
|      ipower|     0.8|
|      febest|     0.8|
|     inhouse|    0.81|
|      gerber|    0.83|
|     nutrend|    0.85|
|  prof-press|    0.85|
|    purederm|    0.88|
|       nivea|    0.89|
+------------+--------+

+---------+--------+
|    Brand|MaxPrice|
+---------+--------+
|pinskdrev| 2574.07|
| omabelle| 2574.07|
|     null| 2574.07|
|     rado| 2574.07|
|   tamina| 2574.07|
|   garmin| 2574.07|
|     acer| 2574.04|
|    apple| 2574.04|
|   yamaha| 2574.04|
|     sony| 2574.04|
|     teka| 2574.04|
|       lg| 2574.04|
|  samsung| 2574.04|
|    haier| 2574.04|
|     asus| 2574.04|
|    wacom| 2574.04|
|    giant| 2573.81|
|pinarello| 2573.81|
|   lenovo| 2573.81|
|      msi| 2573.81|
+---------+--------+

REVIEWING POPULARITY BY BRAND, CATEGORY, USERS¶

+------------+----------+
|    Category|Popularity|
+------------+----------+
|construction|  18454642|
|  appliances|  11777982|
|     apparel|   9091468|
| electronics|   7457408|
|       sport|   3660671|
|   computers|   3363857|
|   furniture|   3177912|
|        kids|   1761900|
| accessories|    903204|
|        auto|    655447|
|country_yard|     85158|
|  stationery|     44407|
|    medicine|     19974|
|        null|         0|
+------------+----------+

+--------+----------+
|   Brand|Popularity|
+--------+----------+
| samsung|   8540399|
|   apple|   5594757|
|  xiaomi|   5208639|
|  huawei|   1903123|
| lucente|   1269502|
|      lg|    943857|
|    sony|    919077|
|    oppo|    772418|
|   bosch|    744440|
|   artel|    670601|
|  lenovo|    633054|
|    acer|    542279|
| respect|    522644|
|   casio|    472347|
| redmond|    435178|
|dauscher|    431270|
|      hp|    427031|
|    asus|    426623|
| philips|    404599|
|   vitek|    400339|
+--------+----------+

+---------+----------+
|     User|Popularity|
+---------+----------+
|568778435|     38916|
|569335945|     19000|
|568789585|     15516|
|568793129|     14085|
|588699257|     11851|
|512365995|      9125|
|568804062|      8658|
|568914925|      8092|
|512475445|      7655|
|585217863|      7339|
|550781174|      6830|
|567475167|      5932|
|568552149|      5746|
|580664156|      5065|
|581203430|      5050|
|568789888|      4978|
|568805468|      4910|
|568819127|      4839|
|576941952|      4782|
|571320128|      4405|
+---------+----------+

+------------+----------+
|    Category|Popularity|
+------------+----------+
|construction|  16686966|
|  appliances|   9300429|
|     apparel|   7374702|
| electronics|   5974373|
|       sport|   3285424|
|   computers|   2714135|
|   furniture|   2635773|
|        kids|   1474542|
| accessories|    802874|
|        auto|    481254|
|country_yard|    126524|
|  stationery|     32726|
|    medicine|     32429|
|        null|         0|
+------------+----------+

+--------+----------+
|   Brand|Popularity|
+--------+----------+
| samsung|   7223512|
|   apple|   5523001|
|  xiaomi|   4332549|
|  huawei|   1607361|
| lucente|    861538|
|    sony|    740577|
|      lg|    654069|
|    oppo|    569519|
|  lenovo|    564938|
|   bosch|    540312|
|    acer|    484812|
|   artel|    470225|
|    asus|    385000|
|      hp|    382400|
|   casio|    359321|
|dauscher|    342748|
| redmond|    292742|
| polaris|    289419|
| philips|    275927|
|   vitek|    271666|
+--------+----------+

+---------+----------+
|     User|Popularity|
+---------+----------+
|597644399|     41280|
|569335945|     23058|
|594718064|     16347|
|597514055|     13717|
|568804062|     11479|
|585217863|      9074|
|593731440|      8584|
|594137269|      8562|
|512475445|      8361|
|512365995|      7200|
|571320128|      6906|
|568819127|      6606|
|568789888|      6048|
|568805468|      4887|
|548931675|      4769|
|468703624|      4756|
|540472896|      4278|
|541572958|      4226|
|594413602|      4091|
|563020567|      3671|
+---------+----------+

REVIEWING EVENT TYPE¶

EVENT TYPE AND PRICE¶

EVENT TYPE AND CATEGORY¶

+------------+------+--------+------+
| primary_cat|  cart|purchase|  view|
+------------+------+--------+------+
|        null|105.82|  111.38|172.37|
| accessories| 79.46|   81.52| 106.5|
|     apparel|123.73|  131.64|160.15|
|  appliances|278.89|   292.8|303.57|
|        auto|194.46|  218.26|226.14|
|   computers|215.16|  222.45|310.78|
|construction|400.53|  417.04|409.32|
|country_yard| 69.66|   79.65| 83.06|
| electronics|254.44|  274.91|299.17|
|   furniture|104.07|  117.81|183.42|
|        kids|105.48|  116.34| 139.4|
|    medicine| 20.42|   21.71| 30.92|
|       sport|125.98|  134.99|137.43|
|  stationery|111.35|  152.83|109.91|
+------------+------+--------+------+

+----------------+------+--------+------+
|           brand|  cart|purchase|  view|
+----------------+------+--------+------+
|            null|117.14|  124.56|177.98|
|          a-case|  5.25|    5.28|  9.67|
|         a-derma| 15.13|   15.91| 13.84|
|         a-elita|  18.1|   18.79| 33.76|
|          a-mega| 72.11|   66.93| 92.16|
|        aardwolf| 31.97|   34.82| 34.22|
|       abaybooks|  5.12|    5.12|  5.12|
|          abhika|478.78|    null|412.61|
|             abk| 11.56|    null| 11.56|
|           abris| 14.44|   14.44| 14.44|
|absolutechampion| 47.01|   64.35| 41.82|
| absolutenewyork|  null|    null|  1.67|
|          abtoys| 43.66|   46.31| 41.02|
|       abugarcia|  null|    null|262.67|
|        academie| 11.33|   10.81| 13.12|
|         accoona|  null|    null| 38.15|
|          accord|  null|    null| 39.15|
|      accumaster|  null|    null| 90.43|
|             acd|  2.11|    1.73|  3.79|
|         acebeam|  null|    null|182.63|
+----------------+------+--------+------+
only showing top 20 rows

+------------+------+--------+------+
| primary_cat|  cart|purchase|  view|
+------------+------+--------+------+
|        null|111.61|  114.86|164.86|
| accessories| 74.34|   71.97|108.24|
|     apparel|126.56|  140.69|179.72|
|  appliances|278.84|  300.73|313.13|
|        auto|199.83|  225.96|222.37|
|   computers|210.91|  215.77|288.28|
|construction|412.29|  419.06|424.54|
|country_yard|  57.0|   72.43| 53.79|
| electronics| 267.4|  279.17|326.71|
|   furniture|114.74|  126.59| 221.8|
|        kids|108.18|  123.68|154.08|
|    medicine| 23.68|   30.51| 41.78|
|       sport|130.36|  132.07| 149.2|
|  stationery| 50.72|   54.64| 75.01|
+------------+------+--------+------+

+----------------+------+--------+------+
|           brand|  cart|purchase|  view|
+----------------+------+--------+------+
|            null|123.37|  132.63|183.77|
|          a-case|  5.16|    4.38|   9.3|
|         a-derma| 14.03|    17.1| 15.04|
|         a-elita| 30.09|    null| 37.42|
|          a-mega| 89.49|   74.17| 91.54|
|        aardwolf| 23.19|   23.75| 29.27|
|       abaybooks|  5.12|    5.12|  5.12|
|             abc| 25.71|    null|  28.9|
|          abhika| 38.12|    null|399.91|
|             abk|  null|    null| 11.56|
|           abris|  null|    null| 14.44|
|absolutechampion| 78.79|   67.75|108.44|
| absolutenewyork|  null|    null|  1.67|
|          abtoys|  9.76|    null| 34.52|
|       abugarcia|439.91|    null|233.56|
|        academie|  11.5|    null| 14.59|
|           acase|  5.12|    5.12|  5.12|
|      accesstyle| 31.98|    null| 31.98|
|         accoona| 32.43|    null| 55.18|
|          accord|  null|    null|  42.5|
+----------------+------+--------+------+
only showing top 20 rows

BIG DATA - ECOMMERCE ¶

Table of Contents

SPARK REVIEW INFO¶

DEPENDENCIES & INITIATE SPARK¶

INITIATE ANALYSIS¶

IMPORT FILE¶

TABLE REVIEW¶

COUNTING RECORDS AND COLUMNS¶

COUNTING NULL AND NA VALUES¶

DATA MANIPULATIONS¶

TIME AND CATEGORY¶

SET SQL TABLE VIEWS¶

EDA SECTION I¶

REVIEWING PRIMARY CATEGORIES AND PRICING¶

REVIEWING BRANDS AND PRICING¶

REVIEWING POPULARITY BY BRAND, CATEGORY, USERS¶

REVIEWING EVENT TYPE¶

EVENT TYPE AND PRICE¶

EVENT TYPE AND CATEGORY¶

REVIEW PRICING BY PLOTTING¶

SPECIFIC ANALYSIS¶

REVENUE REVIEW OF CATEGORICAL DATA¶

OCT REV OF CAT¶

NOV REV OF CAT¶

DEC REV OF CAT¶

JAN REV OF CAT¶

EVALUATING TIME SERIES OF PURCHASES¶

PER HOUR FOR DAYS IN THE MONTH¶

OCT¶

NOV¶

DEC¶

JAN¶

PER DAY FOR ALL DAYS IN THE MONTH¶

OCT¶

NOV¶

DEC¶

JAN¶

PER WEEK FOR ALL WEEKS IN THE MONTH¶

OCT¶

NOV¶

DEC¶

JAN¶

PLOTTING PURCHASE COUNTS BY HOUR AND WEEK FOR THE MONTH¶

OCT¶

NOV¶

DEC¶

JAN¶

- END - ¶