Using Copydoc¶

Basic usage¶

Copydoc cleans and parses HTML from Google docs. Download the HTML version of a Google document and pass it as a string to the CopyDoc constructor:

with open('path/to/html') as f:
    html = f.read()
doc = CopyDoc(html)

Now you can print the parsed document:

print(str(doc))

Access parsed, Beautifulsoup object:

soup = doc.soup

You can define simple key/value pairs in your docs, for example:

HEADLINE: Independent candidates gain in polls

FEATURED_GIF: https://media.giphy.com/media/l3nWl5bhBoim7glNu/giphy.gif

These key/values can be parsed out by passing a list to the Copydoc constructor:

tokens = (
  ('HEADLINE', 'headline'),
  ('FEATURED_GIF', 'featured_gif'),
)
doc = CopyDoc(html, tokens)

Now you can access the key/value pairs as attributes on the Copydoc object.

print(doc.HEADLINE)

This will print “Independent candidates gain in polls”.

The behavior of Copydoc has been designed to work nicely with Jinja.

Here’s a sample template snippet based on the doc from above:

<h1>{{ doc.headline }}</h1>

<img src="{{ doc.featured_gif }}" alt="Featured GIF" />

{{ doc }}