Tableau supports JSON data analysis

Awhile back, I wrote a blog post on how you can read JSON data in Tableau by using an Apache Drill. The old blog post can be found here.

Tableau now supports JSON data out of the box (native) without the need to do data preparation or using Apache Drill. This feature is in Tableau version 10.1.

Further information can be found in this Tableau blog post JSON native connector.

Tableau supports JSON data analysis

How to publish one data source to be utilised across many projects in Tableau Server

This blog post was inspired by a customer’s recent enquiry about publishing one data source to be utilised by all projects within one Tableau Server site.

The background to this enquiry is the fact that in Tableau Server, when you publish content such as a workbook, a data source or a data extract, you are required to publish it at a Project level (i.e. not Site level). If you are not familiar with Tableau Server Permission, please refer to this blog post by Information Lab which explains it in details.

Let’s use a simple scenario below to illustrate the how we can address this.

ACMI segregates content by functional business units (e.g. Marketing, Engineering), each contained within a separate Tableau Server Project. This content, including data-sources, can only be seen or accessed by users and groups assigned to, for example, the Marketing or Engineering Project.

Site and Projects

 

However, there might be common content across both Marketing and Engineering. For example, the annual leave balance applies to both functional business units. You may therefore want to create a Tableau Data Source (TDS) file, connecting live to the annual leave data-source, and make it available to both the Marketing Project and Engineering Projects. Please refer to this blog post by Information Lab on different Tableau file types.

Instead of publishing the “common data source” to both Marketing and Engineering Projects, create a separate Project – let’s call it “Common Content”, and publish the common content (i.e. Common Data Source) in this Project.

You then set the Common Project permission to allow those Engineering and Marketing users / groups to have access; see below.

Site and Projects2.JPG

Permissions on Common Project for users / groups to Data Sources – is set to Connector TableauServerPermission

Now when a user who belongs to the Marketing Project, or associated group, logins to Tableau Server they will have access to the data sources from both the Marketing Project and the Common Project.

marketing_ds.JPG

Likewise, when a user who belongs to the Engineering Project, or associated group, logins to Tableau Server they will have access to the data sources from both the Engineering Project and the Common Project.

engineering_ds

This method allows you to simultaneously segregate specific content but also share common content in a scalable manner, avoiding data duplication and the associated effort.

How to publish one data source to be utilised across many projects in Tableau Server

How to read JSON data into Tableau – using a free and open source framework (Apache Drill on Windows)

Tableau does not read JSON data natively. By using Apache Drill, you can use Tableau to point at JSON data and start analysing it.

Apache Drill is an open source software framework that supports data intensive distributed applications for interactive analysis of large-scale datasets – Wikipedia.

The Apache drill version I am using does not require a lot of infrastructure such as a Hadoop cluster platform behind it. The version I am using is the Apache Drill for Windows. The installation and connection to JSON data into Tableau is quite simple and is done in my laptop.

Apache Drill is a very powerful open source framework and in this blog post I am using Apache Drill on Windows simply to connect my Tableau into JSON data. Yes, this only works in Windows operating system. Sorry non-Windows OS users, you can go here to install Drill on Linux and Mac OS X.

Below are simple, easy steps to get you connecting Tableau into JSON data and start analysing it.

JAVA

If your computer does not have JAVA installed, please follow this section. If already installed, skip to the next section of Installing Apache Drill on Windows.

1. Download the latest JDK from Java SE Downloads

2. Install Java

3. Once installed, find the folder in which your Java is installed. Mine is located in C:\Program Files\Java\jdk1.8.0_45

4. Copy the location of your Java installation directory into a clipboard – mine is C:\Program Files\Java\jdk1.8.0_45

5. In the Windows search, I typed in System Variables and this should open the System Properties window

  • Click on the Advanced tab
  • Click on the Environment Variables button
  • Under System variables, click the New button
  • Enter Variable Name = JAVA_HOME
  • Variable Value = the location of your Java, mine is C:\Program Files\Java\jdk1.8.0_45java_home2
  • Click OK to close

INSTALLING APACHE DRILL ON WINDOWS

Source: Installing Apache Drill on Windows

1. Download the latest version of Apache Drill from Latest version of Apache Drill

2. Move the apache-drill-1.1.0.tar.gz file to a directory where you want to install Drill. I put mine in C:\Program Files\

3. Unzip the TAR.GZ file using a third party tool. The extraction process creates the installation directory named apache-drill-1.1.0 containing the Drill software. For example:

Apache Drill install folder

STARTING DRILL ON WINDOWS

1. Open Command Prompt.

2. Open the apache-drill-1.1.0 folder. For example, my Apache Drill is located on C:\Program Files\apache-drill-1.1.0

Therefore, in my command prompt, I typed in cd C:\Program Files\apache-drill-1.1.0

Command Prompt Drill

3. Go to the bin directory. For example, I typed cd bin

cmd_prompt_drill2

4. Type the following command on the command line: sqlline.bat -u “jdbc:drill:zk=local”

INSTALLING MapR Drill ODBC DRIVER

Source: Installing MapR Drill ODBC Driver

1. Download the MapR Drill ODBC Driver from here

I installed the 64-bit driver as my computer is Windows 64-bit.

2. Double-click the installer from the location within which you downloaded the file.

  • Click Next
  • Select the check box to accept the terms of the License Agreement and click Next
  • Verify or change the install location. Then, click Next – I didn’t change the install location, I leave it to the default one.
  • Click Install
  • When the installation completes, click Finish

3.  Lets verify the installation. In the Start menu, type in ODBC and select the 64-bit ODBC Administrator to open ODBC Administrator.

search_odbc

4. Once open, click on Drivers tab and verify that the MapR Drill ODBC Driver appears in the list of drivers that are installed on your computer.

MapR-ODBC

5. Let’s test the drivers. Whilst still having your ODBC Administrator open, select the System DSN tab.

  • Click the Add buttonsystem-dsn
  • Select the MapR Drill ODBC Drivermapr-data-source
  • In the Data Source Name, enter MapR ODBC Driver for Drill DSNmapr-dsn
  • Click on the Test button on the bottom and you should see a SUCCESS! message as per the below. Click OK several times to close the ODBC Administratorsuccess

STARTING DRILL EXPLORER ON WINDOWS

1. If your ODBC Administrator is not yet open, go to Start menu and type in ODBC Administrator and click Enter to open. Select System DSN tab and double click on MapR ODBC Driver for Drill DSN.

system_dsn2

2. Click on the Drill Explorer button to open Drill Explorer.

drill_expl1

3. Expand the plus sign next to dfs_default until you get into a folder for your sample JSON data. My own sample JSON data is located in C:\Data\JSON so I browse from dfs_default\Data\JSON

4. Click on the sample JSON file – mine is my_json.json and I can see the preview of the JSON data.

json_drill

5. Select the SQL tab (next to Browse tab) and click Preview button. You should see the preview of JSON data. Copy the SQL query into the clipboard.

drill_sql_preview

6. Open Tableau, select connecting via ODBC Driver. Once the ODBC Driver is opened, connect using DSN and select your MapR ODBC Driver for Drill DSN that you created earlier. Click Connect. Click OK.

ODBC-JSON-Tableau

7. Click on the Schema drop down menu to select a schema, click on the magnifying glasses to show all schema, select dfs.default schema.

tableau-odbc-drill

8. Double Click on the New Custom SQL option to open a custom SQL window.

tableau_drill_customSQL

9. Paste the SQL Query that you copied from the Drill Explorer in Step 5 above. Click on the Preview Results button.

tableau_customSQL

10. This is where I spent a bit of time noting down which individual column fields I would like to include in my SQL query. Unfortunately, you have to explicitly define the individual fields you would like to bring into Tableau. I noted in my sample JSON data that I would like to bring 8 fields – address, balance, company, eyeColor, favoriteFruit, gender, name, age.

So I close the preview data window, and modify my Tableau Custom SQL Query into the below then I clicked the Preview Results button.

SELECT address, balance, company, eyeColor, favoriteFruit, gender, name, age FROM `dfs`.`default`.`./Data/JSON/my_json.json`

modified-customSQL-query2

11. Close the Preview Results window and Click OK to close my Edit Custom SQL window. You can now can see the selected fields in the Tableau data explorer.

tableau_data_explorer

12. Select Go to Worksheet and start using the fields from my JSON sample data.

13. Create a Tableau Data Extract to create a portable offline copy in Tableau inline columnar format. This is to avoid the ODBC driver limitation. In my Tableau Data Window, I right click into my Custom SQL_Query data source then select Extract Data.

tableau_extract

14. Continue with creating an extract and save the extract in a desired location in your computer.

15. Now you can start analysing your JSON data in Tableau. Voila!!

tableau_json

FURTHER RESOURCE

JSON documentation – https://drill.apache.org/docs/

Using Apache Drill with Tableau Desktop 9 – https://drill.apache.org/docs/using-apache-drill-with-tableau-9-desktop/

How to read JSON data into Tableau – using a free and open source framework (Apache Drill on Windows)

Cloud BI / Cloud Analytics – The new kid on the block

cloud_analytics

Cloud BI (or Cloud Analytics) seems to be popping up a lot lately.

What is Cloud BI?

Cloud BI is an application that allows data analysis and the ability to access this anywhere via a web browser. The application is installed in and maintained by the Cloud BI provider. Think of it like your Google Mail; an email application in the cloud. To have this email capability, all that is required is a subscription to the service. The email application is simply there. You don’t need to install, perform maintenance and worry about whether the system is up and running. Google, the cloud email provider, is looking after the whole application.

Why should you be considering Cloud BI?

Let’s look at the typical non Cloud BI application. First of all, you need to install the data analysis application. You also need to provide your own hardware and server in which the data analysis application is installed. Once it’s installed, you need to perform a ongoing maintenance such as installing software patches and updates. You need to perform a back up of those application as well. The hardware also requires a refresh / new hardware after several years. You need to engage the right team and resource to assist you. Sometimes it gets too hard, too many internal processes and the whole process of getting the data analysis application up and running can take a long time. With Cloud BI, you bypass all of the above steps. Simply access the analytics application via a web browser and the Cloud BI provider is takes care of the headaches.

In summary, Cloud BI is a great option if:

  • you don’t have the time and resources to install and maintain an in house  BI application. Especially if your organisation has a small (or non-existent) IT team
  • it’s not your organisation’s main business. We understand the importance of data-driven decision making to an organisation and data analysis is key to enabling this. However if your organisation’s main business is otherwise, the last thing you want is to spend lots of time/resources supporting the application itself. You want your staff to focus their time and energy on data analysis to support good decision making as opposed to maintaining an application
  • Moving to the cloud is a key IT policy.

How Scalable is Cloud BI?

I strongly believe Cloud BI is more scalable and reliable than your on-premise BI tool supported internally within your organisation. Cloud BI application providers offer a scalable and responsive BI application to many customers in a multi-tenanted environment. Cloud BI providers do this day-in day-out. They have to ensure the application is up-to-date and any patches are applied as soon as possible. The Cloud BI application providers need to have the best processes to support many customers around the world. In essence, these providers are well geared to taking the hassle away from your organisation.

Is Cloud BI really that simple?

In a nutshell – yes, it’s that simple. However, there are several considerations when it comes to Cloud Analytics:

  • data sovereignty
  • location of your data sources (e.g. On-Premise vs Cloud) and keeping your data up-to-date
  • maturity of your organisation in cloud adoption.

Lets dig deeper into these points.

Data sovereignty

Some Cloud BI providers host their application in the United States or data center elsewhere outside Australia. Australian companies and organisations may have a preference or requirement not to have their data located outside Australia.

Location of your data sources and keeping your Cloud BI data up-to-date

It’s great to have your data analysis tool in the cloud. However, if your data sources (e.g. databases, data warehouse, etc) is not located in the cloud (i.e. in an on-premise data center behind your company’s firewall), the integration between your Cloud BI and your on-premise data sources can be cumbersome.

To keep the Cloud BI data from your on-premise data source up-to-date, you are more likely to push the data from your on-premise data source into the cloud. Your organisation is less likely to allow a firewall to be opened for the Cloud BI to access your on-premise data source. How much data are you pushing?

Cloud BI with On-Premise Data

Cloud BI application works best with cloud data sources (e.g. Amazon Redshift, Google Big Query, Cloudera, Salesforce.com) as both application are located in the cloud. You can connect live to those cloud data sources or enable automated data refresh scheduling.

Cloud BI with Cloud Data

Maturity of your organisation in cloud adoption

It is important to understand where your organisation is currently at in terms of its cloud adoption. Are you using Cloud BI to analyse data located in the cloud whilst still using a legacy BI tool to analyse your on-premise data sources (e.g. databases / data warehouse behind corporate firewall)? Is there a plan to move those on-premise data sources into the cloud eventually?

Cloud BI ain’t the new kid in the block in the Tableau world

In Tableau, Cloud BI is not new.

Tableau Online is Tableau’s Cloud BI, launched in mid 2013. It is one of the most mature Cloud BI tools on the market. You get the best of both worlds – the best data analysis tool with a high level of cloud maturity.

No matter what your cloud requirements, where your data is located or where you are on the cloud journey, Tableau has several deployment options to suit your organisation’s needs:

journey-cloud

Not sure which one is for you? Review the following for more detail on each:

Tableau Public

  • Cloud BI for the public – also known as the ‘YouTube for Tableau visualisations’
  • Free
  • Connect to flat files only and data is also exposed to the public
  • Hosted in North America

Tableau Online

  • Cloud BI – no need to install or maintain software or refresh hardware
  • All you need is an account with a subscription
  • Subscription model with an annual renewal option
  • Works well with cloud data sources and cloud applications (e.g. Salesforce, Google Big Query, Amazon Redshift, etc) – you can connect live
  • For on-premise data sources – it’s a push mechanism from your on-premise data source to Tableau Online or you can utilise Tableau Online Sync client
  • Secure multi-tenancy
  • Hosted in North America or Europe.

Tableau Server in AWS Marketplace

  • Cloud BI with a BYO license
  • It’s a turnkey solution in which the cloud infrastructure (provided by AWS) and the Tableau Server software (provided by Tableau) is pre-installed with a pre-configured license. All you need is your own Tableau Server license (purchased directly from Tableau or Tableau partners).
  • Ideal if your organisation has data sovereignty requirements. The application can be hosted in the local region supported by AWS AMI (other than North America)
  • Tableau will update the Amazon Machine Image (AMI) for all minor and major releases as well as maintenance releases. However, you also have the option to upgrade AMI yourself or do a backup and terminate your current AMI and fire up the new one that Tableau submits to market place. You will get an email when a new version is available
  • Works well with Cloud data sources and cloud applications (e.g. Salesforce, Google Big Query, Amazon Redshift, etc) – you can connect live
  • For On-Premise data sources – it’s a push mechanism

 

Tableau Server in Microsoft Azure Marketplace

Similar to Tableau Server in AWS Marketplace but this is hosted in the cloud infrastructure provided by Microsoft Azure

 

On-Premise Tableau Server

  • Not a cloud solution
  • You install your own Tableau Server software in your own infrastructure behind your corporate firewalls
  • Suitable if your organisation has an enterprise BI requirement in which the application must be hosted in your organisation’s data center
  • You are responsible for the installation and maintenance of Tableau server software, operating systems and the hardware requirements
  • You can connect live to the on-premise data sources or schedule an automatic refresh

Off-Premise Tableau Server

  • Similar to the on-premise Tableau Server but you install Tableau Server at the cloud provider (e.g. Rackspace, Telstra Cloud, Amazon AWS, Microsoft Azure, etc)
  • You are responsible for the installation and maintenance of Tableau Server
  • The cloud infrastructure provider is responsible for installation and maintenance of operating systems and the hardware
  • Your organisation would like to leverage the infrastructure provided by the cloud infrastructure’s provider.
  • If your on-premise data source is hosted in the cloud infrastructure (e.g. as a private cloud), you can connect live to those data sources or schedule an automatic refresh
  • Works well with Cloud data sources and cloud applications (e.g. Salesforce, Google Big Query, Amazon Redshift, etc) – you can connect live or schedule an automatic data refresh.

So if you’re ready to jump into Cloud BI to enable your organisation’s data driven decision making, Tableau is the perfect tool. If you’re not quite there but still on the journey or even have no Cloud ambitions, rest assured that Tableau has a range of options to suit your organisation’s requirements.

Cloud BI / Cloud Analytics – The new kid on the block

How to improve your Tableau skills

Baby learning

Every Tableau Jedi started as a novice. They did not become proficient overnight. They gather plenty of learning resources, continuously practicing and challenging themselves to improve their Tableau skills.

When I first encountered Tableau, I was impressed with the amount of learning resources available, such as books and online articles.

Below is a summary of the learning resources that I use on the journey to becoming a Tableau Jedi.

Training

Tableau offers both paid and free training – schedules and types of training can be found here. You need to create a once off free account (even if you are not ‘yet’ a Tableau customer) to access all the training tutorials and materials :).

  • Classroom training – These paid training sessions are fantastic if you like to learn in a structured way. Tableau offers physical and virtual classroom training.
  • On Demand training – As a time poor individual, I prefer the flexibility to learn at my own time and pace. Tableau On Demand training provides online video tutorials that enable me to do just that. Moreover, On Demand training is FREE. The videos range from 5 to 20 minutes. Most of the video tutorials come with Tableau workbooks that you can play along with whilst watching the video. This is fantastic if you like a hands-on training. You can also do this training offline by downloading the videos and workbooks. Based on my personal experience, I do not recommend you spend a week or two watching all the videos in the On Demand training. The best way is to consume what you need, then go back and apply what you’ve learnt. This can be an iterative process. For example, I start with the video detailing how to connect to my data, then go back to my Tableau workbook and implement what I’ve just seen. I may then encounter another challenge which is how to connect to another data source. I’ll then go back to my On Demand training video, watch it and them use these learnings in my own workbook.
  • LIVE Online training – I thank Tableau for providing another FREE training resource on top of the extensive On Demand training videos and tutorials. You can check out the schedules for the LIVE Online training here.The only downside of this LIVE Online training is it’s scheduled at a specific time and date, mostly North America time. If you live in the other part of the world, the time difference can be a challenge.

Webinars

Tableau webinars is one of my favourite learning resources. It provides much more than just the technical aspects of learning Tableau. Here you’ll have the opportunity to listen to industry leaders on the latest trends as well as customers sharing their experiences. You can also watch previous webinars.

Tableau Community

Tableau has one of the most active user groups and high number of followers. Tableau Community provides you with access to Forums – a fantastic way to have any Tableau question answered quickly or enable you to share your Tableau knowledge by responding to posts by others. If you’re having a problem, there’s a good chance others have encountered the same.

You can also connect to Groups as well as talk about topics of interests in Tableau Community.

To access Tableau Community, create a once off free account. The same account can be used to access Tableau training and tutorials.

Tableau Public

Tableau Public is often called the ‘YouTube’ for Tableau visualisation. Anyone can use Tableau public to build their own visualisation and share it with the world.

You can browse the Tableau Public Gallery or subscribe to Viz of the day by clicking in one of the viz’s and then on the Subscribe link.

Viz-of-the-day

It’s truly a source of inspiration.

Often, if I like a particular Viz, I download the workbook from Tableau Public, open it with my Tableau Desktop then reverse engineer the Viz. I get to learn so much from Tableau Public users who tend to be full of creative people, many of whom use Tableau in a way that I would never have thought possible.

Viz-of-the-day-2

Events and Conference

What better way to meet other Tableau enthusiasts than in person. Tableau regularly holds and sponsors events, user groups and conferences. You can find the events and schedules here. By attending these events, not only do you get the chance to put names to faces, but also make new friends that have the same interests and face similar challenges. This is a great way to connect and expand your Tableau skills.

My favorite event is the Tableau Annual Conference (TC). The upcoming TC15 will be held in Las Vegas. It’s the biggest and hottest event of the year. I’d like to compare it with a Rock Concert because it shares the same characteristics –

  • It has rock bands (not just one but many of them)
  • Meet and Greet with the Rockstars – You get to meet and learn from Tableau Jedis and Zen masters as well as fellow Tableau users / customers. You also get to meet Tableau employees ranging from the maker of the software, the founders, the support team, the consultants and they are all freakishly friendly 🙂
  • The key note speakers are inspirational. Past speakers have included Hans Rosling (Data Visionary from Gap Minder Foundation), Michael Lewis (author of Liar’s Poker and Moneyball), Neil deGrasse Tyson (Astrophysicist) and Dr John Medina to name a few
  • There is heaps of hands-on training (you get to learn a lot more than you can possibly learn on your own in one year)
  • Bring your Tableau problems to Tableau Doctors – you can make an appointment with them and get those tricky questions resolved by Tableau employees. You even also get the prescriptions emailed to you after your appointment which contains solutions to your problem or follow ups
  • Get certified – you can get Tableau certification on the spot and earn those well deserved medals.

Blogs

I follow several blogs about Tableau. Here is my personal list (feel free to share yours as well):

Tableau Whitepapers

The Tableau website also has a lot of interesting whitepapers ranging from best practice visual analysis through to designing an efficient workbook. To access Tableau whitepapers, you can create a once-off free account.

Books

There are lots of books written about Tableau and Data Visualisation.

If you are after a book to learn Tableau, this book is an excellent one – Learning Tableau by Joshua MilliganJoshua is one of the Tableau Zen Masters and I am a big fan of his blog VizPainter.

Learning-Tableau

Here is a list of interesting books to supplement the above resources and further enhance your Tableau and data analysis skills:

http://www.tableau.com/about/blog/2013/7/list-books-about-data-visualisation-24182

Google

Last but not least, you can always ‘Google It’.

Google Tableau

How to improve your Tableau skills

Tableau Server 9 (64 bit version) Deployment – Core considerations in physical and virtual environment

There are minimum System and Hardware requirements when you deploy Tableau Server 9 (64 bit version).

The full list of minimum requirements can be found here 

Tableau Server 9 (64 bit version) Deployment – Core considerations in physical and virtual environment