About the Blog

Its about anything and everything. I, Steven Hancock started this blog for a variety of reasons. I want to start documenting my life and sharing that with others, whether that's family, friends, strangers or my future self. I also want to start sharing my experiences with others in hopes that others can learn from me. Perhaps I can help someone set up an Ubuntu server, write a Django Web Application, or setup a Phonegap Mobile App.

That's it. I'm hear to share. Nothing more, nothing less. I will be covering a wide variety of topics so feel free to browse for the blog entries that interest you most.

Google Doc to HTML

January 31, 2015

I write all of my blog posts in Google Docs before I move them onto my blog. It can often be a pain to get the formatting right. Even though I usually only have a few paragraphs, a couple headings, some links, and maybe a list, the formatting takes almost as long as the actually post (well maybe not quite that long, but you get the idea). My old workflow was writing out the doc, copying and pasting into a text editor and throwing in all of my own HTML tags. Then copy and paste to my simple django blogging app.

 

NO MORE. I pieced together a Google App Script that takes my Google Doc and converts it to HTML. I'm now 100% more productive (if you assume it actually takes me just as long to write the blog post as it does to format it).

Challenges

Originally I tried to use App Script's HTML Service but I couldn't get it to really work. I know it is meant to create better UIs for for App Script backed web apps or custom UIs for Google docs so I was hoping that it would have a simple function that takes a Google Doc and makes HTML out of it… unfortunately no luck.

Links were difficult to handle. I was hoping that they were just a child element of a paragraph. Nope, they are attached to specific characters. So you have to iterate over all of the characters to make sure that they aren't a link.

The Code

If you want the most up to date code checkout the GitHub repository.

function onOpen() {
  var ui = DocumentApp.getUi();

  ui.createMenu('Custom Options')
      .addItem('Convert To HTML', 'convert_to_html')
      .addToUi();
}

/**
 * Convert the current document to html
 */
function convert_to_html() {
  var html = "";
  
  // Get the document to which this script is bound.
  var doc = DocumentApp.getActiveDocument();
  
  var doc_converter = new Doc_To_HTML();
  doc_converter.convert_and_save_to_file(doc);
  
}


function Doc_To_HTML(){
  var self = this;

  self.special_chars
  
  /**
   * Convert a document to html
   * @param document <Document>
   * @return html <string>
   */
  self.convert = function(document){
    var body = document.getBody();
    
    html = self.build_tags(body);
    return html;
  }
  
  /**
   * Convert a document to html and save to a new file on drive
   * @param document <Document>
   * @return html <string>
   */
  self.convert_and_save_to_file = function(document){
    var html = self.convert(document);
    var filename = document.getName() + ' [Converted to HMTL]'
    
    DriveApp.createFile(filename, html, MimeType.HTML);
    DocumentApp.getUi().alert("A new HTML file '" + filename + "' was created.");  
  }
  
  /**
   * Build the html tags from the document
   * @param body <Body>
   * @return html <string>
   */
  self.build_tags = function (body){
    if (typeof body.getNumChildren == "undefined"){
      //There are no children
      return self.find_links_in_text(body);
    }
    var html_text = ""
    var number_of_children = body.getNumChildren();
    
    for(var child_index = 0; child_index < number_of_children; child_index++){
      var child = body.getChild(child_index);
      var type = child.getType();
      
      var tags = ["",""];
      
      switch(type){
        case DocumentApp.ElementType.PARAGRAPH:
          var heading = child.getHeading()
          tags = self.get_heading_tags(heading);
          break;
        case DocumentApp.ElementType.LIST_ITEM:
          tags = self.get_listing_tags(body, child_index, number_of_children)
          break;
        case DocumentApp.ElementType.TEXT:
          tags = ["",""]
          break;
      }
      html_text += tags[0] + self.build_tags(child) + tags[1];
    }  
    return html_text;
  }
  
  /**
   * Takes a heading and figures out which heading tag to apply to it
   * @param heading <Element>
   * @return opening_and_closing_tags <array>
   */
  self.get_heading_tags = function (heading){
    switch(heading){
      case DocumentApp.ParagraphHeading.HEADING1:
        return ["<h1>","</h1>"];
      case DocumentApp.ParagraphHeading.HEADING2:
        return ["<h2>","</h2>"];
      case DocumentApp.ParagraphHeading.HEADING3:
        return ["<h3>","</h3>"];
      default:
        return ["<p>","</p>"]
    }
  }
  
  /**
   * Tasks a list and generates tags for that list. If first element of the list prepends <ul> if last element appends </ul>
   * @param body <Body or Paragraph>
   * @param child_index <int>
   * @param number_of_children <int>
   * @return tags <array> - opening and closing
   */
  self.get_listing_tags = function (body, child_index, number_of_children){
    tags = ["<li>","</li>"]
    if(child_index == 0 || body.getChild(child_index - 1).getType() != DocumentApp.ElementType.LIST_ITEM){
      tags[0] = "<ul>" + tags[0];
    }
    if(child_index + 1 == number_of_children || body.getChild(child_index + 1).getType() != DocumentApp.ElementType.LIST_ITEM){
      tags[1] += "</ul>";
    }
    return tags
  }
  
  /**
   * Finds links within text and adds anchor tags within the text
   * @param text <Element>
   * @return html <string>
   */
  self.find_links_in_text = function (text){
    var text_length = text.getText().length;
    var chars = text.getText();
    var html = "";
    var link_text = undefined;
    
    for(var i = 0; i < text_length; i++){
      var url = text.getLinkUrl(i);
      var char = self.convert_special_characters(chars[i]);
      if(url) {
        if(link_text && link_text.url != url){
          //back to back urls, but this one is different
          html += "<a href='" + link_text.url + "'>" + link_text.anchor_text + "</a>";
          link_text = { url: url, anchor_text: char };
        }
        else if(!link_text) {
          link_text = { url: url, anchor_text: char };
        }
        else {
          link_text.anchor_text += char;
        }
      }
      else{
        if(link_text){
          html += "<a href='" + link_text.url + "'>" + link_text.anchor_text + "</a>";
        }
        link_text = undefined;
        html += char
      }
    }   
    if(link_text){
      html += "<a href='" + link_text.url + "'>" + link_text.anchor_text + "</a>";
    }
    
    return html;
  }
  
  /**
   * Finds special characters and converts them to HTML entities
   * @param char <string>
   * @return <srting>
   */
  self.convert_special_characters = function(char){
    var output = HtmlService.createHtmlOutput(char);
    return output.getContent();
  }
}

Future Changes

At some point I'll want to add code for additional use cases. For instance, the code above was not handled by the converter. I just copied and pasted it with <pre> tags. In the future, I should add support for font styles like bold or italics, maybe even font color and size. Maybe even detect images and turn that into img tags.

I'm not 100% sure how well my use of App Script's HtmlService utility will handle HTML entities and ensure each one is encoded. I already know that when it comes across '<' and '>' only the less than sign is encoded. So I'm sure there are other things like this that will pop up.

I may publish this as an add-on to Google Docs, but since it is still a work in progress and I don't want to have to abide to Google's Add-On Guidelines I will not be doing so any time soon.