Conquering the Command Line

2018-01-02 00:00:00 +0000

Output on Mac OS terminal after typing: telnet towel.blinkenlights.nl

When I was first introduced to the command line I really had to adjust to navigating my computer in a black box with just text. So I avoided the command line as much as possible. I was accustomed to the visual cues and feedback that a computer usually provides. In many ways it felt like I was re-learning how to use a computer via the command line.

Yet, since first learning how to navigate my computer using UNIX commands I’ve learned that the command line doesn’t have to be a scary thing just because there’s no visual feedback when typing a password in on the command line. As security, nothing shows up as you type in your password to indicate that any characters have been entered.

What is the command line?

The command line is a software that executes commands or instructions for a computer to manipulate or interact with its file system.

What is UNIX?

Why Use the Command Line?

In order to get started on the command line you should navigate to your applications and open the Terminal application.

terminal-1.png Above is the Terminal Icon on Mac.


Create a Basic Website Folder on the Command Line

terminal-2.png

Folder structure of sample project


A folder with the above structure can be create on the command line by typing the commands inside of an empty directory:

empty-directory.png

We start inside of an empty directory!


  • Make a directory (also known as a folder) called personal-website
    mkdir personal-website

personal-website.png

We’ve created a folder named personal-website


  • Navigate to inside of the directory called personal-website
    cd personal-website
  • create a directory, inside of the personal-website folder called assets
    mkdir assets

assets-folder.png

We’ve created a folder inside of personal-website to contain all of our assets


  • Navigate inside of the assets folder which is inside of the personal-website folder
    cd assets
  • create a directory, inside of the assets folder named images
    mdkir images
  • create a directory, inside of the assets folder named js
    mkdir js
  • create a directory, inside of the assets folder named css
    mkdir css

all-asset-folders.png

We’ve created folders inside of personal-website/assets to store our project’s assets


terminal-2.png

Woops! We forgot to create an index.html file :(

We are in the assets folder and want an index.html file in our main personal-website folder. Typing cd .. will move us out of the assets folder and into the directory above which is personal-website. Now that we are in the personal-website folder if we type touch index.html a blank index.html file will be created.

complete-directory.png

Some frequently used terminal commands are:

commands to navigate/manipulate the filesystem

ls - list the contents of a directory

pwd - print working directory for the terminal to display the directory you are currently working on

touch - create or open a file without making any changes
very handy when wanting to create empty files without leaving the command line

sudo - this allows you to run commands as a super user

mv - move a file or directory this can be used to move or rename a file by updating the file path

cd - change the current directory you are working on so that you can access files on a different part of the system
cd moves you to the root directory (top level folder on computer — usually the current User)
cd . current directory
cd .. navigates to directory two levels up

mkdir - make a new directory (or a folder)

Commands to Install Software

You can install some software from the command line using the following commands:

  • in Python pip install <package name>.
    Pip is a software package manager for Python.
  • in JavaScript npm install <package name>
    NPM is a package manager for JavaScript pages.

Commands to Run Software

In order to run a script on the command line you need to provide a command prompt and file name. Some examples are:

  • in Java javac filename.java and then java filename compiles java projects and then runs them.
  • in Python python filename runs python scripts.

If you find you are repeating a lot of commands you can scroll through your recent commands using the up/down arrows and edit them and re-run by navigating to them and then pressing enter.

Additional Resources to Get Started with Command Line Prompts

Decorating the Command Line

You can completely customize the colors and outputs on the command line to better suit your visual and aesthetic needs.

I made my command line appear prettier by installing the theme Tomorrow Night. Check out this site for instructions on installing the theme Tomorrow Night.

A version of this article was originally published by Monica Powell on FreeCodeCamp on December, 5th, 2017

How to Add Author Bio to Posts in Jekyll

2017-10-02 00:00:00 +0000

The above image is a preview of how the author bio will appear at the end of this tutorial.

Datalogues is powered by Jekyll, a static-site generator. The theme I selected for the site did not support authors out of the box however it is easy to implement author functionality in Jekyll.

1) Edit/create appropriate folders and files in Jekyll project

  • front matter of individual blog posts where author should be included
  • _layouts/post.html

and created the following folders/files:

  • _data/authors.yml
  • _includes/author_bio.html

2) Store Author Data

I have stored my author data in a folder called _data that contains a file authors.yml. The author information associated with monica_powell is pulled into my post from the authors.yml data file.

monica_powell:
    name: Monica Powell
    email: monica@aboutmonica.com
    twitter: http://twitter.com/waterproofheart
    bio: Monica Powell is a web technologist that cares about increasing the visiblity of underestimated individuals in technology. In 2015, she received the &#35;GIRLBOSS award from Sophia Amoruso’s Girl Boss Foundation. She’s currently focusing on making tech more enjoyable & accessible and is always up to chat data visualizations, web development or &#35;BlackGirlMagic.
    image: http://www.datalogues.com/assets/images/monica-powell-headshot.jpg

3) Reference relevant authors in the front matter of individual blog posts

In the front matter of each blog post in Jekyll you should reference authors in YAML (YAML Ain’t Markup Language) using the following format author: NAME OF AUTHOR. The name of author should be an exact match one of the variables in your authors.yml

The front matter in Jekyll sets the metadata for a post and is key to properly building posts. YAML is a human friendly data serialization standard for all programming languages.

Here is an example of the front matter for this particular post.

layout: post
title: How to Add Author Bio in Jekyll
description: A guide to adding author bios in Jekyll
image: assets/images/author-bio.png
permalink: adding-author-bios-in-jekyll
author: monica_powell
comments: true

4) Define HTML for author bio

in the folder _includes create a file called author_bio.html to define the HTML for how author bio’s should be displayed

5) Add author bios to the post layout

Add a line in post.html where author bio should appear and pull in the HTML as defined above in author_bio.html. The logic is set so that it will only call that HTML template if there is author information associated with this particular post.

  ## if there is an author bio
  
  {% if author.bio %}
      {% include author_bio.html %}
  {% endif %}
  

All done! Feel free to comment below or tweet me if you have any questions!

How to Use the TMDB API to Find Films with the Highest Revenue

2017-05-28 00:00:00 +0000

Get Out has been one of the most talked about films in 2017 and as of April 2017 the highest grossing debut film based on an original screenplay in history. We want to programmatically find out how Get Out ranked amongst other 2017 American films and which films have earned the most revenue in 2017. This tutorial assumes most readers have basic working knowledge of Python.

Prequisites

  • Install the following python packages and run them ideally in a virtualenv.
    • config
    • requests
    • locale
    • pandas
    • matplotlib
  • In addition to installing the above dependencies we will need to request an API key from The Movie DB (TMDB). TMDB has a free API to programmatically access information about movies.

    • In order to request an API key from TMDB:
      1. Create a free account
      2. Check your e-mail to verify your account.
      3. Visit the API Settings page in your Account Settings and request an API key
      4. You should now have an API key and be ready to go!
import config # to hide TMDB API keys
import requests # to make TMDB API calls
import locale # to format currency as USD
locale.setlocale( locale.LC_ALL, '' )

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter # to format currency on charts axis

api_key = config.tmdb_api_key # get TMDB API key from config.py file

If you plan on committing your project to GitHub or another public repository and need help setting up config you should read this article about using config to hide API keys.

Part 1: Determine the highest earning American films of 2017

In this section we will request 2017 data from TMDB, store the data we recieve as a json into a dataframe and then use matplotlib to visualize our data.

Make API Call to TMDB to return the data of interest

In order to get the highest earning films from TMDB an API request needs to be constructed to return films with a primary_release_year of 2017 sorted in descending order by revenue.

response = requests.get('https://api.themoviedb.org/3/discover/movie?api_key=' +  api_key + '&primary_release_year=2017&sort_by=revenue.desc')
highest_revenue = response.json() # store parsed json response

# uncomment the next line to get a peek at the highest_revenue json structure
# highest_revenue

highest_revenue_films = highest_revenue['results']

Create dataframe from JSON returned from TMDB API call

Let’s store the JSON data returned from our API call in a dataframe to store each film and its associated revenue.

# define column names for our new dataframe
columns = ['film', 'revenue']

# create dataframe with film and revenue columns
df = pandas.DataFrame(columns=columns)

Now to add the data to our dataframe we will need to loop through the data.

# for each of the highest revenue films make an api call for that specific movie to return the budget and revenue
for film in highest_revenue_films:
    # print(film['title'])
    film_revenue = requests.get('https://api.themoviedb.org/3/movie/'+ str(film['id']) +'?api_key='+ api_key+'&language=en-US')
    film_revenue = film_revenue.json()
    #print(locale.currency(film_revenue['revenue'], grouping=True ))
    df.loc[len(df)]=[film['title'],film_revenue['revenue']] # store title and revenue in our dataframe    

Below is what the dataframe head (top 5 lines) looks like after iterating through the films our API call returned.

df.head()
film revenue
0 Beauty and the Beast 1221782049
1 The Fate of the Furious 1212583865
2 Guardians of the Galaxy Vol. 2 744784722
3 Logan 608674100
4 Kong: Skull Island 565151307

Let’s actually see the data with matplotlib

We will create a horizontal bar chart using matplotlib to display the revenue earned for each film.

matplotlib.style.use('ggplot')
fig, ax = plt.subplots()
df.plot(kind="barh", y='revenue', color = ['#624ea7', '#599ad3', '#f9a65a', '#9e66ab', 'purple'], x=df['film'], ax=ax)

#format xaxis in terms of currency
formatter = FuncFormatter(currency)
ax.xaxis.set_major_formatter(formatter)
ax.legend().set_visible(False)

avg = df['revenue'].mean()

# Add a line for the average
ax.axvline(x=avg, color='b', label='Average', linestyle='--', linewidth=1)

ax.set(title='American Films with Highest Revenue (2017)', xlabel='Revenue', ylabel='Film')
[<matplotlib.text.Text at 0x111f8aba8>,
 <matplotlib.text.Text at 0x111f20978>,
 <matplotlib.text.Text at 0x111fad2e8>]

png

Part 2: Determine the highest earning American films of all-time

In this section we will request all-time data from TMDB, store the data we recieve as a json into a dataframe and then use matplotlib to visualize our data. Our API call will be similar to the one we used in the previous section but sans &primary_release_year=2017.

Requesting, formatting and storing API data

response = requests.get('https://api.themoviedb.org/3/discover/movie?api_key=' +  api_key + '&sort_by=revenue.desc')
highest_revenue_ever = response.json()
highest_revenue_films_ever = highest_revenue_ever['results']

columns = ['film', 'revenue', 'budget', 'release_date']
highest_revenue_ever_df = pandas.DataFrame(columns=columns)

for film in highest_revenue_films_ever:
    # print(film['title'])

    film_revenue = requests.get('https://api.themoviedb.org/3/movie/'+ str(film['id']) +'?api_key='+ api_key+'&language=en-US')
    film_revenue = film_revenue.json()
    # print(film_revenue)

    # print(locale.currency(film_revenue['revenue'], grouping=True ))

    # Lord of the Rings duplicate w/ bad data was being returned  https://www.themoviedb.org/movie/454499-the-lord-of-the-rings
    # It's budget was $281 which is way too low for a top-earning film. Therefore in order to be added to dataframe the film
    # budget must be greater than $281.

    if film_revenue['budget'] > 281:
        # print(film_revenue['budget'])
        # add film title, revenue, budget and release date to the dataframe
        highest_revenue_ever_df.loc[len(highest_revenue_ever_df)]=[film['title'],film_revenue['revenue'], (film_revenue['budget'] * -1), film_revenue['release_date']]

highest_revenue_ever_df.head()    

film revenue budget release_date
0 Avatar 2781505847 -237000000 2009-12-10
1 Star Wars: The Force Awakens 2068223624 -245000000 2015-12-15
2 Titanic 1845034188 -200000000 1997-11-18
3 The Avengers 1519557910 -220000000 2012-04-25
4 Jurassic World 1513528810 -150000000 2015-06-09

Calculate the gross profit

We can calculate the gross profit by subtracting total revenue from amount spent. Earlier we made the budget values negative therefore we need to add the revenue to the (negative) budget to get the gross profit which is effectively subtraction.

highest_revenue_ever_df['gross'] = highest_revenue_ever_df['revenue'] + highest_revenue_ever_df['budget']

What does the dataframe look like now?

highest_revenue_ever_df.head()
film revenue budget release_date gross
0 Avatar 2781505847 -237000000 2009-12-10 2544505847
1 Star Wars: The Force Awakens 2068223624 -245000000 2015-12-15 1823223624
2 Titanic 1845034188 -200000000 1997-11-18 1645034188
3 The Avengers 1519557910 -220000000 2012-04-25 1299557910
4 Jurassic World 1513528810 -150000000 2015-06-09 1363528810

Plotting data in matplotlib with horizontal bar charts and a scatter plot

fig, ax = plt.subplots()
highest_revenue_ever_df.plot(kind="barh", y='revenue', color = ['#624ea7', '#599ad3', '#f9a65a', '#9e66ab', 'purple'], x=highest_revenue_ever_df['film'], ax=ax)
formatter = FuncFormatter(currency)
ax.xaxis.set_major_formatter(formatter)
ax.legend().set_visible(False)
ax.set(title='American Films with Highest Revenue (All Time)', xlabel='Revenue', ylabel='Film')
[<matplotlib.text.Text at 0x111c90e48>,
 <matplotlib.text.Text at 0x111f85588>,
 <matplotlib.text.Text at 0x1120f0e48>]

png

fig, ax = plt.subplots()
highest_revenue_ever_df.plot(kind="barh", y='gross', color = ['#624ea7', '#599ad3', '#f9a65a', '#9e66ab', 'purple'], x=highest_revenue_ever_df['film'], ax=ax)
formatter = FuncFormatter(currency)
ax.xaxis.set_major_formatter(formatter)
ax.legend().set_visible(False)
ax.set(title='Gross Profit of the American Films with Highest Revenue (All Time)', xlabel='Gross Profit', ylabel='Film')
[<matplotlib.text.Text at 0x112285cf8>,
 <matplotlib.text.Text at 0x1120bf198>,
 <matplotlib.text.Text at 0x11234de10>]

png

fig, ax = plt.subplots()
highest_revenue_ever_df.plot(kind='scatter', y='gross', x='budget', ax=ax)
formatter = FuncFormatter(currency)
ax.xaxis.set_major_formatter(formatter)
ax.yaxis.set_major_formatter(formatter)
ax.set(title='Profit vs Budget of the American Films with Highest Revenue (All Time)', xlabel='Budget', ylabel='Gross Profit')

[<matplotlib.text.Text at 0x112b67f98>,
 <matplotlib.text.Text at 0x112b29518>,
 <matplotlib.text.Text at 0x112b8e550>]

png

# Adding release year to dataframe
# highest_revenue_ever_df['year'] = pd.DatetimeIndex(highest_revenue_ever_df['release_date']).year
# print(highest_revenue_ever_df)

Limitations

The above data and graphs do not account for inflation (the TMDB API returns by revenue unadjusted by inflation) therefore the earnings from more recent films are more weighted than their earlier counterparts. When looking at all time data inflation should be adjusted for however when looking over a shorter time period adjusting for inflation might not be necessary. Older films would appear above if inflation was taken into account, as it is now, the oldest film on this list was The Titanic in 1997.

Cover photo is Chris Washington, played by Daniel Kaluuya, from Get Out. Universal Pictures

How to Hide Your API Keys in Python

2017-05-27 00:00:00 +0000

Protect your application’s API Keys while committing to Git.

If you plan on programming any applications and storing your code in a public GitHub repository then it is important that you protect your API keys 🔑 by ensuring that they are not searchable or otherwise publicly accessible.

What’s an API?

An application programming interface (API) is a structured set of instructions for building applications. If you want to leverage data from services such as Twitter, The New York Times, Slack, Spotify etc. then you should read their APIs to figure out how to structure your queries to receive data from their service or to post on their service.

What are API keys?

API keys allow developers to access APIs and are unique keys associated with that particular developer and/or application. Just like you shouldn’t share your passwords you should never share your API keys. It is important to protect your API keys so that people do not take any actions as you which could result in your API key being revoked due to somebody else exceeding rate limits or abusing/violating an APIs terms of service. A rate limit is when an application limits the number of API calls that a specific application or user can make during a specified period of time.

How do I protect my API keys on Github?

Here’s how to hide API keys in Python from GitHub using config.py to store your sensitive API keys and tokens in a separate file from your main script. I used similar code when accessing the Twitter Search API for my blackgirlmagic twitter bot.

Create 3 Files in Your Application

config.py

This file will store your API keys. You just need to update the portion in the strings with your API keys, depending on the service you may or may not need all four types of API keys. These in particular are required to create a Twitter application.

main_script.py

This file will store your main script that needs to access the API keys. This file can be named whatever you like.

.gitignore

A .gitignore file tells GitHub to ignore the noted files, directories or files that end in specific extensions when committing files to GitHub.** This step is crucial to ensure that your config.py file does not end up viewable on GitHub! Here’s a collection of useful .gitignore templates.**


Originally published at *Black Tech Diva.

How to Change Repo Language in GitHub

2017-05-20 00:00:00 +0000

I recently started working on a Weather app in Flask to auto-detect a user’s location based off of their IP address. After committing some updates to GitHub my app switched from being labeled as predominately Python to 98.9% CSS even though it was a Flask application in which most of the code I had written was in Python and HTML. Now and again, I do not agree with how GitHub classifies the languages in my repositories so I set out to figure out how to fix this issue.

github_before_linguist_update.png

Before: My Flask App Appeared in GitHub as 98.9% CSS.

Pro-tip: Help GitHub properly detect your repositories main language(s).

GitHub has a linguist library that auto-detects the language within every repository. Upon researching how to resolve GitHub misclassifying the language of your projects I found out the solution is as simple as telling GitHub which files to ignore. While you still want to commit these files to GitHub and therefore can’t use a .gitignore you can tell GitHub’s linguist which files to ignore in a .gitattribute file. (Side note: Check out my piece on “Hiding API Keys from GitHub” if you are interested in learning about .gitignore).

the solution is as simple as telling GitHub which files to ignore!

Upon examining the documentation for the linguist library I learned that adding just one line to a .gitattribute file would resolve my language issues for this particular repo.

My .gitattribute:

This one-line file told GitHub to ignore all of my files in my static/ folder which is where CSS and other assets are stored for a Flask app. Vendor files can sometimes take up a lot of relative space so I am telling the linguist to just ignore them (since they were accounting for 98.9% of my project)!

github_after_linguist_update.png

After: My Flask App Appears in GitHub now as 56.2% Python and 43.8% HTML. Here’s a repository with sample .gitattribute files for you try the next time you disagree with the linguist ;). Note: If the linguist truly is wrong GitHub encourages you to report it as an issue.

I hope this article was helpful! I would love to hear some of your tricks for GitHub and am happy to answer any questions you may have.

Also published on Medium.

How to Import CSV and XLS Data into Pandas

2017-05-20 00:00:00 +0000

Pandas is a Python Data Analysis Library. It allows you to play around with data and perform powerful data analysis.

In this example I will show you how to read data from CSV and Excel files in Pandas. You can then save the read output as in a Pandas dataframe. The sample data used in the below exercise was generated by https://mockaroo.com/.

import pandas as pd
csv_data_df = pd.read_csv('data/MOCK_DATA.csv')

Preview the first 5 lines of the data with .head() to ensure that it loaded.

csv_data_df.head()
id first_name last_name email gender ip_address
0 1 Ross Ricart rricart0@berkeley.edu Male 217.151.154.186
1 2 Jenn Pizer jpizer1@usnews.com Female 104.123.13.234
2 3 Delainey Sulley dsulley2@xing.com Male 6.101.0.150
3 4 Nessie Feirn nfeirn3@samsung.com Female 97.93.173.170
4 5 Noami Flanner nflanner4@woothemes.com Female 174.228.138.242

You will need to pip install xlrd if you haven’t already. In order to import data from Excel.

import xlrd
excel_data_df = pd.read_excel('data/MOCK_DATA.xlsx')
excel_data_df.head()
id first_name last_name email gender ip_address
0 1 Chloris Antliff cantliff0@shareasale.com Female 131.17.2.171
1 2 Brion Gierok bgierok1@posterous.com Male 245.41.126.3
2 3 Fleur Skells fskells2@creativecommons.org Female 75.0.34.132
3 4 Dora Privost dprivost3@newsvine.com Female 51.202.4.39
4 5 Annabella Hucker ahucker4@typepad.com Female 124.80.181.41

Image Courtesy of jballeis (Own work) CC BY-SA 3.0, via Wikimedia Commons

Address

Brooklyn, New York