Battle of the Machine (Learning)

Is one language better than the other?

Vittorio Scacchetti
4 min readSep 14, 2020

Language is tough. Learning languages has been proven difficult the older we get. Yet with communication being a part of, well… everything, the need for breaking down communication barriers is important.

Languages for Data Scientists:

  • Python (obviously)
  • R
  • Scala (Java)
  • Julia
  • SAS
  • SQL

SQL

It could be argued that this is not a programming language (despite “language” being in its acronym). All of these languages utilize SQL for database management.

SQL is a fourth-generation language as human-like syntax is used to retrieve and manipulate data. So while it may be intuitive to understand with an easier learning curve, SQL does lack many of the features of the rest on this list. Important, but not an all-in-one.

SAS (Statistical Analysis Systems)

It already sounds good!

  • Intuitive fourth generation language
  • Interactive — log window instructs the user with notes and errors (not supported on all platforms)
  • Built-in data analysis/data reporting libraries (no importing)
  • Can manipulate data within its database and incorporates encryption and security algorithms
  • SAS Studio — accessible via any web browser with no installation necessary
  • Report Output Format — display results in easy-to-read reports and graphics (including PowerPoint and .pdf formats)

But,

  • free version is not an all-inclusive package
  • not open source and therefore packages and libraries don’t update as often as the open-source languages we know
  • advanced tools are expensive and difficult to incorporate

** however, it is highly reliable with a strong support system for users **

Scala (JVM)

  • based on Java (fast)
  • one of the oldest languages and therefore incorporated/established in many industries (still the most frequently sought-after language skill on job posts)
  • many big data tools (Hadoop, Flink, Spark) are written in Java
  • identical code on multiple platforms via Java Virtual Machine

Julia

  • easy learning curve (very similar to Python)
  • super fast (based on C)
  • useful for complex mathematical operations
  • used for risk analytics with FLUX, ML architecture

R

  • most used language of Data Scientists
  • open-source and completely free
  • command-line driven (but user interfaces exist like RGUI & RStudio)
  • works well with complex data objects of varying sizes
  • community support, collaborative package development
  • *graphical libraries are extensive for translating analysis to non-technical audiences
  • *pro/con, R is not a general purpose language used for other tasks
  • designed and written by statisticians
  • can’t handle auto-formatting characters (copy-paste problems)

R vs. Python

  • R was developed by statisticians for complex analysis, Python by developers and programmers for readability and versatility
  • Syntax learning curve
    R: variable assignment can go both ways denoted by arrows or =
    (<- , ->, =)
    Python: assignment operator is consistent throughout (=)
# Assignment using equal operator.
var.1 = c(0,1,2,3) # Assignment using leftward operator.
var.2 <- c("learn","R") # Assignment using rightward operator.
c(TRUE,1) -> var.3
  • For most purposes, speed difference is negligible. If speed is relevant, probably would want to investigate Julia/Scala.

Speed

Python is faster than R using loops with fewer than 1,000 steps (up to 8x faster).

After 1,000 steps, the function lapply in R is faster than Python’s for loop.

  • Visualization
    R: many packages (ggplot2, plotly, Leaflet, Lattice, highcharter, Sunburst, RGL, etc.)
    Python: few packages (matplotlib, seaborn, ggplot, plotly)
Scatter Plots R (top) vs. Python (bottom)
  • Deep Learning (AI) with Keras and KerasR (written in Python)
  • popularity TIOBE index (The Importance Of Being Earnest)

In summary:

Julia and Scala — promising young languages both with significant upside and features:

SAS — widely used but not free;

R — industry specific but very powerful;

Python — powerful with user/programmer in mind.

References:

--

--