Using Copydoc

Basic usage

Copydoc cleans and parses HTML from Google docs. Download the HTML version of a Google document and pass it as a string to the CopyDoc constructor:

with open('path/to/html') as f:
    html = f.read()
doc = CopyDoc(html)

Now you can print the parsed document:

print(str(doc))

Access parsed, Beautifulsoup object:

soup = doc.soup

Using named tokens

You can define simple key/value pairs in your docs, for example:

HEADLINE: Independent candidates gain in polls

FEATURED_GIF: https://media.giphy.com/media/l3nWl5bhBoim7glNu/giphy.gif

These key/values can be parsed out by passing a list to the Copydoc constructor:

tokens = (
  ('HEADLINE', 'headline'),
  ('FEATURED_GIF', 'featured_gif'),
)
doc = CopyDoc(html, tokens)

Now you can access the key/value pairs as attributes on the Copydoc object.

print(doc.HEADLINE)

This will print “Independent candidates gain in polls”.

Using with Jinja

The behavior of Copydoc has been designed to work nicely with Jinja.

Here’s a sample template snippet based on the doc from above:

<h1>{{ doc.headline }}</h1>

<img src="{{ doc.featured_gif }}" alt="Featured GIF" />

{{ doc }}