Preliminares

Tarea 2 Semestre Primavera del 2020

La empresa X-Education ofrece cursos en línea a profesionales y un objetivo comercial importante es lograr que los potenciales usuarios se registren en algunos de los cursos ofrecidos por la compañía. Imagine que a usted la empresa le hace una oferta para trabajar en este proyecto y usted como experto marketero acepta. La información que usted posee del proyecto es la base de datos, el diccionario de las variables y un pequeño resumen que le envió la empresa.

Un poco más sobre el contexto…

Cuando un usuario entra a la web puede navegar por los cursos, ver vídeos o rellenar un formulario para ser contactado. Cuando sucede esto último, las personas entregan su número telefónico o e-mail y pasan a ser consideradas como un “objetivo” (asignados en la variable Lead Number). La siguiente fase es la captación de clientes. Cuando una persona ya es un “objetivo”, es contactada por el equipo de ventas mediante llamados telefónico o vía mail. En este proceso utilizan también “objetivos” que han sido marcados en el pasado (variable Lead Origin indica de dónde proviene cada “objetivo”). Gracias a esta estrategia algunos “objetivos” pasan a ser clientes y otros no (variable Converted). Actualmente, cerca de un 30% de los ”objetivos” contactados se vuelve cliente, por lo que la empresa ve como una buena oportunidad de mejora el lograr subir esta cifra, para así poder identificar a los mejores “objetivos” y de esta forma mejorar su estrategia de ventas y enfocarse en estos posibles clientes en vez de gastar tiempo comunic´andose con todos los “objetivos”.

Lo que veremos en esta auxiliar es de dónde provienen los lead.

Instalación de paquetes útiles

rm(list=ls())               # Limpiamos todos los objetos creados en R
graphics.off()              # Limpiamos los gráficos
options(digits = 3)         # Declaramos dígitos despues del punto para observar (decimas, centesimas,...)
set.seed(12345)          #fijar semilla de aleatoriedad
library(readr)            #Para leer el .csv
library(glmnet)           #Ajusta modelo lineal
library(ggplot2)          #Para realización de gráficos complejos
library(corrplot)
library(dplyr)
library(fastDummies)
library(naniar)
library(RColorBrewer)
library(ggcorrplot)
library(caret) 
library(MASS)
library(ggpmisc)
library(MLmetrics)
library(Metrics)
library(mlogit)
library(reshape2)

Lectura de base a utilizar

BB <- read_csv("C:/Users/enaga/Documents/Marketing 2/Lead Scoring.csv") #se carga la base de datos
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_character(),
##   `Lead Number` = col_double(),
##   Converted = col_double(),
##   TotalVisits = col_double(),
##   `Total Time Spent on Website` = col_double(),
##   `Page Views Per Visit` = col_double(),
##   `Asymmetrique Activity Score` = col_double(),
##   `Asymmetrique Profile Score` = col_double()
## )
## i Use `spec()` for the full column specifications.
head(BB,10) #se leen los primeros 10 datos

Limpieza de la base

Antes de crear algún modelo, se limpia la base y se realiza un EDA. Se asume que la variable “Select” es lo mismo que un NA (opción puesta por default al no responder), por lo que se hace el respectivo reemplazo.

#En esta sección se hace una revisión general de la base, con el fin de entender qué indica cada variable y que valores pueden tomar. 

#View(BB)

#Lead origin tiene categorias casi vacias así que se eliminan las filas que contienen esa informacion.
BB<-BB[!(BB$`Lead Origin`=="Quick Add Form"),]
BB<-BB[!(BB$`Lead Origin`=="Lead Import"),]

summary(BB) #tipos de datos por columna
##  Prospect ID         Lead Number     Lead Origin        Lead Source       
##  Length:9184        Min.   :579533   Length:9184        Length:9184       
##  Class :character   1st Qu.:596415   Class :character   Class :character  
##  Mode  :character   Median :615140   Mode  :character   Mode  :character  
##                     Mean   :616997                                        
##                     3rd Qu.:637086                                        
##                     Max.   :660737                                        
##                                                                           
##  Do Not Email       Do Not Call          Converted      TotalVisits   
##  Length:9184        Length:9184        Min.   :0.000   Min.   :  0.0  
##  Class :character   Class :character   1st Qu.:0.000   1st Qu.:  1.0  
##  Mode  :character   Mode  :character   Median :0.000   Median :  3.0  
##                                        Mean   :0.386   Mean   :  3.5  
##                                        3rd Qu.:1.000   3rd Qu.:  5.0  
##                                        Max.   :1.000   Max.   :251.0  
##                                                        NA's   :112    
##  Total Time Spent on Website Page Views Per Visit Last Activity     
##  Min.   :   0                Min.   : 0.0         Length:9184       
##  1st Qu.:  14                1st Qu.: 1.0         Class :character  
##  Median : 250                Median : 2.0         Mode  :character  
##  Mean   : 489                Mean   : 2.4                           
##  3rd Qu.: 938                3rd Qu.: 3.2                           
##  Max.   :2272                Max.   :55.0                           
##                              NA's   :112                            
##    Country          Specialization     How did you hear about X Education
##  Length:9184        Length:9184        Length:9184                       
##  Class :character   Class :character   Class :character                  
##  Mode  :character   Mode  :character   Mode  :character                  
##                                                                          
##                                                                          
##                                                                          
##                                                                          
##  What is your current occupation What matters most to you in choosing a course
##  Length:9184                     Length:9184                                  
##  Class :character                Class :character                             
##  Mode  :character                Mode  :character                             
##                                                                               
##                                                                               
##                                                                               
##                                                                               
##     Search            Magazine         Newspaper Article  X Education Forums
##  Length:9184        Length:9184        Length:9184        Length:9184       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Newspaper         Digital Advertisement Through Recommendations
##  Length:9184        Length:9184           Length:9184            
##  Class :character   Class :character      Class :character       
##  Mode  :character   Mode  :character      Mode  :character       
##                                                                  
##                                                                  
##                                                                  
##                                                                  
##  Receive More Updates About Our Courses     Tags           Lead Quality      
##  Length:9184                            Length:9184        Length:9184       
##  Class :character                       Class :character   Class :character  
##  Mode  :character                       Mode  :character   Mode  :character  
##                                                                              
##                                                                              
##                                                                              
##                                                                              
##  Update me on Supply Chain Content Get updates on DM Content Lead Profile      
##  Length:9184                       Length:9184               Length:9184       
##  Class :character                  Class :character          Class :character  
##  Mode  :character                  Mode  :character          Mode  :character  
##                                                                                
##                                                                                
##                                                                                
##                                                                                
##      City           Asymmetrique Activity Index Asymmetrique Profile Index
##  Length:9184        Length:9184                 Length:9184               
##  Class :character   Class :character            Class :character          
##  Mode  :character   Mode  :character            Mode  :character          
##                                                                           
##                                                                           
##                                                                           
##                                                                           
##  Asymmetrique Activity Score Asymmetrique Profile Score
##  Min.   : 7                  Min.   :11                
##  1st Qu.:14                  1st Qu.:15                
##  Median :14                  Median :16                
##  Mean   :14                  Mean   :16                
##  3rd Qu.:15                  3rd Qu.:18                
##  Max.   :18                  Max.   :20                
##  NA's   :4218                NA's   :4218              
##  I agree to pay the amount through cheque
##  Length:9184                             
##  Class :character                        
##  Mode  :character                        
##                                          
##                                          
##                                          
##                                          
##  A free copy of Mastering The Interview Last Notable Activity
##  Length:9184                            Length:9184          
##  Class :character                       Class :character     
##  Mode  :character                       Mode  :character     
##                                                              
##                                                              
##                                                              
## 
BB[BB=='Select']<-NA #Transforma valores 'select' en NA (se asume que la gente dejó sin rellenar esos espacios, por lo que en verdad son NA's)

BB$Country[BB$Country!='India']<- NA #Se observa que más del 50% de los datos provienen de India, por lo que no se consideran los otros paises

BB$Country[is.na(BB$Country)]<-"Otro" #Los datos en que dejaron como NA se les consideró como Otro

BB$Specialization[is.na(BB$Specialization)]<-"Otro" #se considera que las personas que dejaron vacio ese campo fue porque no estaban incluidos en las opciones, por lo que se crea otra categoria

BB$`What is your current occupation`[is.na(BB$`What is your current occupation`)]<-"Otro" #Pasa lo mismo que en el caso anterior

#Juntar los dos Google
BB$`Lead Source` <- as.character(BB$`Lead Source`)
BB$`Lead Source`[BB$`Lead Source` == 'google'] <- 'Google'

Al observar la base de datos nos damos cuenta que hay muchas variables que tienen missing values. Para no entorpercer la inferencia, se deciden sacar las variables con una cantidad mayor de 40% de missing values entre sus datos, pero se mantienen algunas de esas variables que se consideran muy relevantes a la hora de buscar lo pedido. Estas son Asymmetrique Profile Index,Asymmetrique Profile Score,How did you hear about X Education,Lead Profile,Asymmetrique Activity Index,Asymmetrique Activity Score.

apply(BB, 2, function(x) length(which(is.na(x)))/9240*100) # % de NA que hay 
##                                   Prospect ID 
##                                         0.000 
##                                   Lead Number 
##                                         0.000 
##                                   Lead Origin 
##                                         0.000 
##                                   Lead Source 
##                                         0.368 
##                                  Do Not Email 
##                                         0.000 
##                                   Do Not Call 
##                                         0.000 
##                                     Converted 
##                                         0.000 
##                                   TotalVisits 
##                                         1.212 
##                   Total Time Spent on Website 
##                                         0.000 
##                          Page Views Per Visit 
##                                         1.212 
##                                 Last Activity 
##                                         0.942 
##                                       Country 
##                                         0.000 
##                                Specialization 
##                                         0.000 
##            How did you hear about X Education 
##                                        77.868 
##               What is your current occupation 
##                                         0.000 
## What matters most to you in choosing a course 
##                                        29.221 
##                                        Search 
##                                         0.000 
##                                      Magazine 
##                                         0.000 
##                             Newspaper Article 
##                                         0.000 
##                            X Education Forums 
##                                         0.000 
##                                     Newspaper 
##                                         0.000 
##                         Digital Advertisement 
##                                         0.000 
##                       Through Recommendations 
##                                         0.000 
##        Receive More Updates About Our Courses 
##                                         0.000 
##                                          Tags 
##                                        36.115 
##                                  Lead Quality 
##                                        51.255 
##             Update me on Supply Chain Content 
##                                         0.000 
##                     Get updates on DM Content 
##                                         0.000 
##                                  Lead Profile 
##                                        73.701 
##                                          City 
##                                        39.697 
##                   Asymmetrique Activity Index 
##                                        45.649 
##                    Asymmetrique Profile Index 
##                                        45.649 
##                   Asymmetrique Activity Score 
##                                        45.649 
##                    Asymmetrique Profile Score 
##                                        45.649 
##      I agree to pay the amount through cheque 
##                                         0.000 
##        A free copy of Mastering The Interview 
##                                         0.000 
##                         Last Notable Activity 
##                                         0.000
#se sacan las variables que tienen hartos missing values y que no tienen poder predictivo (intuitivamente)

BDLimpia = subset(BB, select = -c(`Asymmetrique Profile Index`,`Asymmetrique Profile Score`,`How did you hear about X Education`,`Lead Profile`,`Asymmetrique Activity Index`,`Asymmetrique Activity Score` ))

En honor al tiempo y para que no queden mareados con todos las relaciones que se buscaron en la base de datos se decide hacer el EDA de las variables que ocuparemos finalmente en la parte 1 y 2 del auxiliar.

#Pregunta 1 COUNTRY V/S LEAD ORIGIN

ggplot(BDLimpia, aes(`Country`)) + 
     geom_bar(aes(fill=`Lead Origin`), position = "dodge") + 
     theme_classic()+labs(title="Country V/S Lead Origin")

#Pregunta 1 SPECIALIZATION V/S LEAD ORIGIN

tableespecialization<-table(BDLimpia$`Lead Origin`, BDLimpia$Specialization)
tableespecialization
##                          
##                           Banking, Investment And Insurance
##   API                                                    39
##   Landing Page Submission                               270
##   Lead Add Form                                          29
##                          
##                           Business Administration E-Business E-COMMERCE
##   API                                          38          2          7
##   Landing Page Submission                     339         53         99
##   Lead Add Form                                25          2          3
##                          
##                           Finance Management Healthcare Management
##   API                                     89                    21
##   Landing Page Submission                799                   119
##   Lead Add Form                           84                    18
##                          
##                           Hospitality Management Human Resource Management
##   API                                         14                       103
##   Landing Page Submission                     88                       656
##   Lead Add Form                               11                        86
##                          
##                           International Business IT Projects Management
##   API                                         17                     28
##   Landing Page Submission                    155                    327
##   Lead Add Form                                5                     11
##                          
##                           Marketing Management Media and Advertising
##   API                                      135                    23
##   Landing Page Submission                  615                   176
##   Lead Add Form                             88                     3
##                          
##                           Operations Management Otro Retail Management
##   API                                        74 2915                 8
##   Landing Page Submission                   398  125                90
##   Lead Add Form                              30  301                 2
##                          
##                           Rural and Agribusiness Services Excellence
##   API                                          9                   3
##   Landing Page Submission                     62                  36
##   Lead Add Form                                2                   1
##                          
##                           Supply Chain Management Travel and Tourism
##   API                                          37                 18
##   Landing Page Submission                     296                183
##   Lead Add Form                                15                  2

Modelo de eleccion

Pregunta 1

Plantee un modelo homogéneo para determinar de dónde proviene el objetivo (variable Lead Origin). Gracias al EDA observamos cierta correlación entre las variables Lead origin vs Country y Lead origin vs Specialization, así que ahora veremos cómo se comportan estas variables en el modelo.

#Desarrollo

#Para usar la librería mlogit, la que se vio en clases con el profe, se debe tener la base en formato long. Actualmente está en formato wide, por lo que hay que formatearla según las variables que nos importan.

#Se crea la base de datos que se usará en esta pregunta

PreguntaUno<-select(BDLimpia, Country, Specialization, `Lead Origin`, `Lead Number`)

UnoModificable<-PreguntaUno #Se modifica esta base

head(PreguntaUno,5)
#Transformamos las variables para poder usarlas en la base long

API <- (ifelse(UnoModificable$`Lead Origin`=="API", TRUE, FALSE))

Landing_Page_Submission <- (ifelse(UnoModificable$`Lead Origin`=="Landing Page Submission", TRUE, FALSE))

Lead_Add_Form <- (ifelse(UnoModificable$`Lead Origin`=="Lead Add Form", TRUE, FALSE))

UnoModificable <- cbind(UnoModificable, API)#se agregan las columnas
UnoModificable <- cbind(UnoModificable, Landing_Page_Submission)
UnoModificable <- cbind(UnoModificable, Lead_Add_Form)
UnoModificable$`Lead Origin` = NULL#se borra lead origin

head(UnoModificable,5)
#Se deja la base en formato long 

BaseLong<-melt(UnoModificable,id.vars = c("Country", "Specialization","Lead Number"),
           variable.name = "Lead Origin",value.name = "Choice")

head(BaseLong,5)#5 primeros datos
tail(BaseLong,5)#5 ultimos datos
#Observar como aumenta el numero de observaciones en la Baselong

Teniendo la base en formato long, podemos estimar un modelo con la función mlogit

logit_data <- mlogit.data(data = BaseLong, shape = "long", choice = "Choice",id.var = "Lead Number")
#corremos el modelo


modelo1<-mlogit(Choice ~ 0| Country + Specialization | 0, data = logit_data)

Las opciones son: 1=API 2=Landing Page Submission 3=Lead Add Form

summary(modelo1)
## 
## Call:
## mlogit(formula = Choice ~ 0 | Country + Specialization | 0, data = logit_data, 
##     method = "nr")
## 
## Frequencies of alternatives:choice
##      1      2      3 
## 0.3898 0.5320 0.0782 
## 
## nr method
## 8 iterations, 0h:0m:3s 
## g'(-H)^-1g = 9.96E-05 
## successive function values within tolerance limits 
## 
## Coefficients :
##                                           Estimate Std. Error z-value Pr(>|z|)
## (Intercept):2                              2.11419    0.17593   12.02  < 2e-16
## (Intercept):3                             -2.83426    0.35149   -8.06  6.7e-16
## CountryOtro:2                             -1.82379    0.10216  -17.85  < 2e-16
## CountryOtro:3                              4.10313    0.21494   19.09  < 2e-16
## SpecializationBusiness Administration:2    0.33666    0.24867    1.35   0.1758
## SpecializationBusiness Administration:3   -0.44013    0.41824   -1.05   0.2926
## SpecializationE-Business:2                 1.32433    0.74785    1.77   0.0766
## SpecializationE-Business:3                 0.38504    1.16359    0.33   0.7407
## SpecializationE-COMMERCE:2                 0.71685    0.43462    1.65   0.0991
## SpecializationE-COMMERCE:3                -0.56268    0.82345   -0.68   0.4944
## SpecializationFinance Management:2         0.25235    0.20914    1.21   0.2276
## SpecializationFinance Management:3         0.27444    0.35528    0.77   0.4398
## SpecializationHealthcare Management:2     -0.15330    0.30069   -0.51   0.6102
## SpecializationHealthcare Management:3     -0.05270    0.49352   -0.11   0.9150
## SpecializationHospitality Management:2    -0.10615    0.34269   -0.31   0.7568
## SpecializationHospitality Management:3     0.10122    0.59076    0.17   0.8640
## SpecializationHuman Resource Management:2 -0.06404    0.20641   -0.31   0.7564
## SpecializationHuman Resource Management:3  0.02968    0.35128    0.08   0.9327
## SpecializationInternational Business:2     0.42242    0.31745    1.33   0.1833
## SpecializationInternational Business:3    -1.42298    0.61488   -2.31   0.0207
## SpecializationIT Projects Management:2     0.63200    0.26792    2.36   0.0183
## SpecializationIT Projects Management:3    -1.03323    0.48488   -2.13   0.0331
## SpecializationMarketing Management:2      -0.34904    0.20126   -1.73   0.0829
## SpecializationMarketing Management:3      -0.40527    0.34336   -1.18   0.2379
## SpecializationMedia and Advertising:2      0.08516    0.28669    0.30   0.7664
## SpecializationMedia and Advertising:3     -1.66728    0.72634   -2.30   0.0217
## SpecializationOperations Management:2     -0.11797    0.21981   -0.54   0.5915
## SpecializationOperations Management:3     -1.07039    0.38466   -2.78   0.0054
## SpecializationOtro:2                      -4.66347    0.19827  -23.52  < 2e-16
## SpecializationOtro:3                      -2.93368    0.31294   -9.37  < 2e-16
## SpecializationRetail Management:2          0.52466    0.41616    1.26   0.2074
## SpecializationRetail Management:3         -1.25557    0.89901   -1.40   0.1625
## SpecializationRural and Agribusiness:2    -0.00407    0.40572   -0.01   0.9920
## SpecializationRural and Agribusiness:3    -1.21193    0.91991   -1.32   0.1877
## SpecializationServices Excellence:2        0.55992    0.63722    0.88   0.3796
## SpecializationServices Excellence:3       -0.84738    1.30533   -0.65   0.5162
## SpecializationSupply Chain Management:2    0.36357    0.25314    1.44   0.1509
## SpecializationSupply Chain Management:3   -1.26004    0.44181   -2.85   0.0043
## SpecializationTravel and Tourism:2         0.41183    0.30775    1.34   0.1808
## SpecializationTravel and Tourism:3        -2.02099    0.83344   -2.42   0.0153
##                                              
## (Intercept):2                             ***
## (Intercept):3                             ***
## CountryOtro:2                             ***
## CountryOtro:3                             ***
## SpecializationBusiness Administration:2      
## SpecializationBusiness Administration:3      
## SpecializationE-Business:2                .  
## SpecializationE-Business:3                   
## SpecializationE-COMMERCE:2                .  
## SpecializationE-COMMERCE:3                   
## SpecializationFinance Management:2           
## SpecializationFinance Management:3           
## SpecializationHealthcare Management:2        
## SpecializationHealthcare Management:3        
## SpecializationHospitality Management:2       
## SpecializationHospitality Management:3       
## SpecializationHuman Resource Management:2    
## SpecializationHuman Resource Management:3    
## SpecializationInternational Business:2       
## SpecializationInternational Business:3    *  
## SpecializationIT Projects Management:2    *  
## SpecializationIT Projects Management:3    *  
## SpecializationMarketing Management:2      .  
## SpecializationMarketing Management:3         
## SpecializationMedia and Advertising:2        
## SpecializationMedia and Advertising:3     *  
## SpecializationOperations Management:2        
## SpecializationOperations Management:3     ** 
## SpecializationOtro:2                      ***
## SpecializationOtro:3                      ***
## SpecializationRetail Management:2            
## SpecializationRetail Management:3            
## SpecializationRural and Agribusiness:2       
## SpecializationRural and Agribusiness:3       
## SpecializationServices Excellence:2          
## SpecializationServices Excellence:3          
## SpecializationSupply Chain Management:2      
## SpecializationSupply Chain Management:3   ** 
## SpecializationTravel and Tourism:2           
## SpecializationTravel and Tourism:3        *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -3870
## McFadden R^2:  0.533 
## Likelihood ratio test : chisq = 8840 (p.value = <2e-16)

¿Qué quieren decir los estimadores?

Pregunta 2

Discuta la capacidad de predicción del modelo anterior.

#separamos en train y test 

index <- sample(1:nrow(UnoModificable), size=round(0.8*nrow(UnoModificable),0))

train <- UnoModificable[index,]
test <- UnoModificable[-index,]

Repetimos el proceso para arreglar el formato de la base

#Base en long
# train
long.train <- melt(train, id.vars = c("Lead Number", "Specialization", "Country" ), variable.name = "Lead_Origin", value.name = "choice")

# test

long.test <- melt(test, id.vars = c("Lead Number", "Specialization", "Country"), variable.name = "Lead_Origin", value.name = "choice")


# data mlogit
# train
long.train.data <- mlogit.data(data=long.train, shape="long", choice="choice", id.var = "Lead Number")

# test

long.test.data <- mlogit.data(data=long.test, shape="long", choice="choice", id.var = "Lead Number")

Se prueba la capacidad de predicción con la data de train

modelotrain<-mlogit(choice ~ 0| Country + Specialization |0, data = long.train.data, model = TRUE)

summary(modelotrain)
## 
## Call:
## mlogit(formula = choice ~ 0 | Country + Specialization | 0, data = long.train.data, 
##     model = TRUE, method = "nr")
## 
## Frequencies of alternatives:choice
##      1      2      3 
## 0.3870 0.5349 0.0781 
## 
## nr method
## 18 iterations, 0h:0m:3s 
## g'(-H)^-1g = 3.69E-07 
## gradient close to zero 
## 
## Coefficients :
##                                            Estimate Std. Error z-value Pr(>|z|)
## (Intercept):2                                2.2492     0.2062   10.91  < 2e-16
## (Intercept):3                               -2.9441     0.4059   -7.25  4.1e-13
## CountryOtro:2                               -1.7921     0.1137  -15.77  < 2e-16
## CountryOtro:3                                4.1084     0.2401   17.11  < 2e-16
## SpecializationBusiness Administration:2      0.3009     0.2921    1.03    0.303
## SpecializationBusiness Administration:3     -0.3397     0.4865   -0.70    0.485
## SpecializationE-Business:2                   0.9289     0.7622    1.22    0.223
## SpecializationE-Business:3                 -16.1244  2714.1327   -0.01    0.995
## SpecializationE-COMMERCE:2                   0.4510     0.4760    0.95    0.343
## SpecializationE-COMMERCE:3                  -0.4654     0.9881   -0.47    0.638
## SpecializationFinance Management:2           0.2631     0.2460    1.07    0.285
## SpecializationFinance Management:3           0.5238     0.4096    1.28    0.201
## SpecializationHealthcare Management:2       -0.2393     0.3399   -0.70    0.481
## SpecializationHealthcare Management:3        0.0334     0.5500    0.06    0.952
## SpecializationHospitality Management:2      -0.3240     0.3780   -0.86    0.391
## SpecializationHospitality Management:3       0.4622     0.6650    0.70    0.487
## SpecializationHuman Resource Management:2   -0.2250     0.2382   -0.94    0.345
## SpecializationHuman Resource Management:3    0.1631     0.4024    0.41    0.685
## SpecializationInternational Business:2       0.3785     0.3563    1.06    0.288
## SpecializationInternational Business:3      -1.2279     0.6477   -1.90    0.058
## SpecializationIT Projects Management:2       0.4632     0.3068    1.51    0.131
## SpecializationIT Projects Management:3      -0.8156     0.5508   -1.48    0.139
## SpecializationMarketing Management:2        -0.4597     0.2333   -1.97    0.049
## SpecializationMarketing Management:3        -0.1818     0.3941   -0.46    0.645
## SpecializationMedia and Advertising:2       -0.1438     0.3158   -0.46    0.649
## SpecializationMedia and Advertising:3       -1.5860     0.7496   -2.12    0.034
## SpecializationOperations Management:2       -0.2392     0.2533   -0.94    0.345
## SpecializationOperations Management:3       -0.9900     0.4387   -2.26    0.024
## SpecializationOtro:2                        -4.7779     0.2295  -20.82  < 2e-16
## SpecializationOtro:3                        -2.8587     0.3618   -7.90  2.9e-15
## SpecializationRetail Management:2            0.1068     0.4375    0.24    0.807
## SpecializationRetail Management:3           -1.3046     0.9139   -1.43    0.153
## SpecializationRural and Agribusiness:2       0.0810     0.4859    0.17    0.868
## SpecializationRural and Agribusiness:3      -1.3552     1.2184   -1.11    0.266
## SpecializationServices Excellence:2          0.2890     0.6512    0.44    0.657
## SpecializationServices Excellence:3         -0.8069     1.3162   -0.61    0.540
## SpecializationSupply Chain Management:2      0.1985     0.2882    0.69    0.491
## SpecializationSupply Chain Management:3     -0.9855     0.4902   -2.01    0.044
## SpecializationTravel and Tourism:2           0.2823     0.3451    0.82    0.413
## SpecializationTravel and Tourism:3          -1.7994     0.8602   -2.09    0.036
##                                              
## (Intercept):2                             ***
## (Intercept):3                             ***
## CountryOtro:2                             ***
## CountryOtro:3                             ***
## SpecializationBusiness Administration:2      
## SpecializationBusiness Administration:3      
## SpecializationE-Business:2                   
## SpecializationE-Business:3                   
## SpecializationE-COMMERCE:2                   
## SpecializationE-COMMERCE:3                   
## SpecializationFinance Management:2           
## SpecializationFinance Management:3           
## SpecializationHealthcare Management:2        
## SpecializationHealthcare Management:3        
## SpecializationHospitality Management:2       
## SpecializationHospitality Management:3       
## SpecializationHuman Resource Management:2    
## SpecializationHuman Resource Management:3    
## SpecializationInternational Business:2       
## SpecializationInternational Business:3    .  
## SpecializationIT Projects Management:2       
## SpecializationIT Projects Management:3       
## SpecializationMarketing Management:2      *  
## SpecializationMarketing Management:3         
## SpecializationMedia and Advertising:2        
## SpecializationMedia and Advertising:3     *  
## SpecializationOperations Management:2        
## SpecializationOperations Management:3     *  
## SpecializationOtro:2                      ***
## SpecializationOtro:3                      ***
## SpecializationRetail Management:2            
## SpecializationRetail Management:3            
## SpecializationRural and Agribusiness:2       
## SpecializationRural and Agribusiness:3       
## SpecializationServices Excellence:2          
## SpecializationServices Excellence:3          
## SpecializationSupply Chain Management:2      
## SpecializationSupply Chain Management:3   *  
## SpecializationTravel and Tourism:2           
## SpecializationTravel and Tourism:3        *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -3080
## McFadden R^2:  0.536 
## Likelihood ratio test : chisq = 7090 (p.value = <2e-16)

Para poder usar la matriz de confusión necesitamos que test tenga forma y valores en el formato que lo entrega predict. Para esto necesitamos que no existan personas repetidas, es decir, que una observación, sea una fila y no 3 como en long. Y que en vez de el nombre de la categoría de Lead_Origen, esten los niveles que entrega el modelo, o sea, API = 1, Landing_Page_Submission = 2, Lead Add Form = 3.

# sacamos los valores predichos por el modelo para base test

logit.test.m <- predict(modelotrain, newdata = long.test.data)
logit.test.m <- max.col(logit.test.m)
logit.test.m <- factor(logit.test.m)
# obtenemos el vector "real" de clasificación transformando la base test 

test.filt <- filter(long.test, choice == TRUE)
test.filt.org <- (ifelse(test.filt$Lead_Origin=="API", 1, ifelse(test.filt$Lead_Origin == "Landing_Page_Submission", 2, 3)))

confusionMatrix(logit.test.m, factor(test.filt.org))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   1   2   3
##          1 269 355  61
##          2 429 543  74
##          3  39  58   9
## 
## Overall Statistics
##                                        
##                Accuracy : 0.447        
##                  95% CI : (0.424, 0.47)
##     No Information Rate : 0.52         
##     P-Value [Acc > NIR] : 1.00000      
##                                        
##                   Kappa : -0.006       
##                                        
##  Mcnemar's Test P-Value : 0.00324      
## 
## Statistics by Class:
## 
##                      Class: 1 Class: 2 Class: 3
## Sensitivity             0.365    0.568   0.0625
## Specificity             0.622    0.429   0.9427
## Pos Pred Value          0.393    0.519   0.0849
## Neg Pred Value          0.594    0.478   0.9220
## Prevalence              0.401    0.520   0.0784
## Detection Rate          0.146    0.296   0.0049
## Detection Prevalence    0.373    0.569   0.0577
## Balanced Accuracy       0.493    0.499   0.5026

¿Qué quieren decir los valores estimados?