How to Upload an Html File to Drupal Website

Introduction

This module is to divide ane single large HTML document into a structured Drupal book where the heading level hierarchy is respected. This module works with HTML exported from Word; HTML document converted from PDF equally well as HTML document exported from Adobe InDesign. The purpose of this module is to brand provide an culling for legacy documents to see see WCAG accessibility requirements. By converting the documents into HTML, it besides makes full-text search easier.

What's new in 2.x?

New features of the 2.x version include:
- Tested under PHP 7
- Thoroughly tested under GovCMS 7
- An example sub-module that automatically creates book parent (publication) and book (publication section) pages then y'all don't have set them upwardly yourself.
- Entity reference back up
- Workbench support
- Content administrators can specify the language of imported pages if locale module is enabled
- Anchor reference links are re-established using aliased URLs when they are available
- Another minor bug fixes and improvements

Workbench Integration

We are pleased to announce that we have commenced work to integrate HTML Import and Workbench to provide full content workflow back up. Functions such equally scheduled publishing/unpublishing, content review etc will soon be support.

govCMS

This module is compatible with Australian Authorities's govCMS distribution and tin can exist used to easily import reports for agencies.

GovCMS users are recommended to use the 7.x-2.ten-dev branch of the HTML Import module for best compatibility. This module has been successfully tested against GovCMS 7.x-ii.0-beta3.

We have used this module to produce HTML version of reports for the Australian government agencies. Some examples of our past projects are:

  • Department of Prime number Minister and Chiffonier
  • Australian Crime Commission
  • Department of Prime number Minister and Cabinet and Department of Wellness
  • Torres Strait Provincialregional authority
  • Department of Infrastructure
  • National Blood Authorization
  • Australian Trade and Investment Commission (Austrade)
  • Inspector-Full general of Intelligence and Security (IGIS)
  • Australian Commission for Law Enforcement Integrity (ACLEI)

Sponsorship

This project is sponsored by XiNG Digital

Logo of XiNG Digital, blue square with a star in the centre

Main characteristic

The chief features of this module are:

  • Allows user to specify heading levels at which the HTML document is divided and imported. For example, if H3 is selected under "Heading level depth", each H1, H2 and H3 will go a separate folio in the imported volume.
  • Imports images. The module scans and fixes the paths of the images referenced by imported pages.
  • Respects the heading level hierarchy of the source document past reconstructing the same volume hierarchy
  • Scans and re-creates reference links. Reference links in the source HTML may be divided into dissimilar book pages after import. The module scans and re-links the reference links to maintain the integrity of the document.
  • Scans and moves footnotes/endnotes. If footnotes/endnotes are well-formatted, the module scans and moves the footnotes/endnotes to the sections where they are referenced for easy reading.
  • Meets WCAG accessibility requirements. The module preserves accessibility properties of the source HTML such every bit ALT text.
  • Cleans up undesirable Word characters such equally smart quotes in titles for clean URL cosmos

XiNG Digital has also developed techniques to convert almost any PDF certificate to WCAG 2.0 compliant HTML that can be used by our document importer. Please contact the maintainers if yous would similar to learn more.

Installation and configuration

  1. Download and enable this module as well equally its dependencies
  2. Create or alter the content blazon to which the imported book pages will exist fastened. Exist certain this content blazon has a file field with the automobile name "field_images". This field should likewise be made single value and merely accepts "null" files. This field does not demand to be mandatory.
  3. Create or modify the content type for imported pages. Exist certain this content type has the following fields
    1. Footnotes. A "Long text" field where the footnotes/endnotes a section references to volition be stored
    2. Imported images. A file field with the auto name "field_html_import_images", and allows unlimited number of values and only accepts image formats you would like this field to accept. Please note because a large number of paradigm files may be imported by this module, it is desirable if those images are kept in directories that are relevant to their corresponding imported pages. The File (field) path module allows us to assign path such every bit "documents/[node:nid]/images" to keep the file system neat and tidy.
    3. Publication parent. A single value "Node reference" field that only allows references to the type of content specified in Step 2 in a higher place. This field will let u.s.a. to assign hierarchical URL aliases to imported pages.
  4. Get to Structure > Feeds importers > Add together importers and follow these steps:
    1. Basic settings > Brand the content type in Step 2 to a higher place the "Attach to content type". Be certain "Periodic import" is "Off"
    2. Fetcher > Change > Choose "HTML Import Fetcher"
    3. Fetcher > Settings > Make sure "Allowed file extensions" only allows HTML extensions such as "html"
    4. Parser > Change > Choose "HTML Import Parser"
    5. Processor > Modify > Choose "HTML Import Processor"
    6. Processor > Settings > Be sure the content blazon in Step 3 above is selected in the parcel field. Be sure "Update existing nodes" is selected. Be certain "Full HTML" or equivalent is selected nether "Text format".
    7. Processor > Mapping > Title maps Title, Body maps to Body, Footnotes maps to Footnotes and Book ID maps to Publication parent (Node reference by node ID).
  5. Content > Books > Settings > Make sure "Content types immune in volume outlines" and "Content type for kid pages" reverberate the correct content blazon in Steps ii and 3 in a higher place.
  6. Create a new content using the blazon in Pace 2. Upload a zipped directory of images to the file filed created in Step 2. Note all paradigm files needs to be stored in a directory named for example images, and the source HTML needs to reference images in this directory). Upload the source HTML to the File field under Feed field group. Choose your desired heading level depth. Follow the on-screen instructions for the rest fields if you wish.
  7. Save and import.

Working with Microsoft Give-and-take

Well prepared Word document can be saved using Word's "Salvage HTML (Filtered)" role and the HTML file can be readily imported using this module. You may still want to clear up the source HTML code so it doesn't acquit the junk Discussion code to your website. Adobe Dreamweaver has a handy "Clean upwards Give-and-take HTML" role that does this job nicely.

Example

An example written report you tin use to test this module is provided. This example, Australian Haemovigilance Written report was recently processed by us from a Discussion certificate for the National Blood Authority. This case is licenced under Creative Commons Attribution 4.0 licence. The last report is also available on the National Blood Authority's website.

Additional notes

Note one: Some changes to the display of the imported pages such as the table of contents menu, volume navigation and footnotes may demand to exist configured. The best place to first is to create your own instance of the template file book-navigation.tpl.php nether your theme/module.

Notation 2: The HTML Import module is retentivity jump. If you are feel problems while importing HTML files, please start check your PHP memory limit. We have been testing this module under 256MB without seeing problems when importing HTML files converted from almost 200 to 300 pages PDF. It is however advisable to increase your PHP retentiveness limit to 512MB in a product environment to ensure in that location is sufficient resource for other critical production web functions.

Notation iii: During the development phase of this module we take imported more than than 100,000 pages converted from PDF, Give-and-take and InDesign across a number of websites. The largest unmarried document it has successfully imported has more than than 1,000 pages.

Note iv: The HTML Import module is packed with a small utility that consolidates imported files and salvage the HTML content to a "full-text" text filed of the book parent folio. This field allows full-text search to match both the parent folio and any of its children if they incorporate any of the search words. Apache Solr Views module could exist used to provide a very useful full-text search function.

lynchrommed.blogspot.com

Source: https://www.drupal.org/project/html_import

0 Response to "How to Upload an Html File to Drupal Website"

Postar um comentário

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel