| 
						
						
						
					 | 
				
				 | 
				
					@ -0,0 +1,43 @@ | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					# Data Science Workflow # | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					This repository explores through examples how to use the command line in an efficient and productive way for data science tasks. Learning to obtain, scrub, explore, and model your data. | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					# Introduction # | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					During this examples your will learn how to: (*i*) run docker containers, (*ii*) use the command line, (*iii*) run a basic application. | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					## Docker ## | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					Let us introduce docker, the first platform to make data science. Docker is a tool that allows developers, sys-admins or data-scientist to easily deploy their applications in a sandbox (**called containers**) to run on a host *operating system i.e. Linux*. The key benefit of Docker is that it allows users to package an application with all of its dependencies into a standardized unit for software development. Unlike virtual machines, containers do not have high overhead and hence enable more efficient usage of the underlying system and resources.[^1] | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					### Installing and using the Docker image ### | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					Docker pull | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					We recommend that you create a new directory, navigate to this new directory, and then run the following when you’re on macOS or Linux: | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					``` shell | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					$ docker run --rm -it -v`pwd`:/data datascienceworkshops/data-science-at-the-command-line | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					``` | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					Or the following when you’re on Windows and using the command line: | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					``` shell | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					$ docker run --rm -it -v %cd%:/data datascienceworkshops/data-science-at-the-command-line | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					``` | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					Or the following when you’re using Windows PowerShell: | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					``` shell | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					$ docker run --rm -it -v ${PWD}:/data datascienceworkshops/data-science-at-the-command-line | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					``` | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					In the above commands, the option -v instructs docker to map the current directory to the /data directory inside the container, so this is the place to get data in and out of the Docker container. | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					# Notes # | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					- [ ] Make an container with Ubuntu 18.04  | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					- [ ] Packages to install: csvkit,  | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					
 | 
				
			
			
		
	
		
			
				
					 | 
					 | 
				
				 | 
				
					[^1]: Docker for beginners, https://docker-curriculum.com/. |