RTF, DOC, and HTML - Part 3 - HTML to DOC

RTF, DOC, and HTML - Part 3 - HTML to DOC

In the previous parts of this series, we got a glimpse of the concept as well as the importance of the three main types of document formats: HTML, RTF, and DOC. In the following articles, we will learn together how to switch between the above forms flexibly. Let's start with the first method - HTML to DOC

How to create a new HTML document traditionally?

  • First of all, we will probably want to look back at how we can get an essential HTML document, using the built-in tools.

HTML code editing tool

  • Notepad is one of the tools used to edit HTML code.
  • To open Notepad, go to Start => type the keyword Notepad

How to create an HTML website

  • Step 1: Open Notepad, then type the following code: <p> Hello Office Component</p>
  • Step 2: Save the file
  • Click the File button and select Save As
  • Select the folder you want to save the file.
  • Name the file (filenames cannot contain spaces or special characters). The filename has the extension .html or .htm
  • Encoding selects UTF-8 (This is very important, UTF-8 is used to display some unique characters)
  • Click Save

How to run an HTML website

  • Step 1: Open the browser (Chrome, Firefox, Brave, etc.)
  • Step 2: Click on the address bar and then press the key combination (Ctrl + o).
  • Step 3: Select the HTML file and press the Open button.

How to create a new Word document traditionally?

Word files are called documents. Whenever you start new projects in Word, you will need to create a new document, be it a new document or a sample document. Besides, you also need to know how to open an existing document in a desktop or laptop computer using Microsoft Word.

To create a new document:

  • When starting a new project in Word, the first step is to open a new, blank document:
  • Step 1: Select the File tab. The backstage view will appear.
  • Step 2: Select New, then click Blank document. A blank document will appear.

Open an existing document

In addition to creating a new document, you'll often need to open an existing document.

  • Step 1: Navigate to Backstage view, then click Open.
  • Step 2: Select Computer, then click Browse. Or you can select OneDrive (formerly known as SkyDrive) to open files stored on OneDrive.
  • Step 3: At this point, the Open dialog box, navigate and select your document appears, then click Open.
  • If you recently opened the desired slideshow, you can browse Recent Documents instead of searching for the file.

Pin a document

If you often work, using a document, you can pin it to Backstage view for faster access.

  • Step 1: Navigate to Backstage view, then click Open.
  • Step 2: A list of recent edits will appear. Hover over the document you want to pin, then click the pin next to the document. Click the pushpin icon.
  • Step 3: The document will be pinned to the list of recent documents until you unpin. To unpin a document, you can click the pushpin icon again.
  • Also, you can also pin folders to Backstage view for quick access. From the Backstage view, click Open and navigate to the folder you want to pin, then click the pin icon.

Compatibility Mode

  • In such cases, you may need to work with documents created on earlier versions of Microsoft Word, such as Word 2007. When opening these documents, Word will display them in Compatibility Mode. Compatibility Mode disables specific features, so you can only access the search command for programs used to create documents. For example, if you open a document that created in Word 2007, you can only use the tabs and commands found in Word 2007.
  • To exit Compatibility Mode, you will have to convert the document to the current version. However, if you're collaborating with people who only have access to early versions of Microsoft Word, it is best to leave our document in Compatibility Mode so that the format won't change.

HTML to DOC Converter

In this section, we will show you how to integrate the ability to convert any HTML web page into WORD format into your application using the Word Office Component.

Import Library

We have fully installed all of the methods and classes used in this example in our OfficeComponent.Word library. Make sure you have installed and imported this library so you can begin to experience it.

  • C# Version: using OfficeComponent.Word;
  • VB Version: Imports OfficeComponent.Word

Declare variable

This example will require you to declare two variables as a string: htmlContent and error.

  • htmlContent will be used to store the content of the entire site.
  • error will be considered as a flag variable to warn of errors arising during execution.
  • C# Version:
          string htmlContent;
          string error;
  • VB Version:
    Dim htmlContent As String
    Dim [error] As String

Classification of inputs

This example will specify three types of input:

  • Type == 0: HTML File
  • Type == 1: HTML Page
  • else: HTML Text.

Create a new Word document

We will need to create a variable of WordDocument class to store the conversion result:

  • C# Version: WordDocument document = new WordDocument();
  • VB Version: Dim document As New WordDocument()

We will need to add some sections to our newly created Word document. Here, we take advance of Section class.

  • C# Version: Section section = document.AddSection();
  • VB Version: Dim section As Section = document.AddSection()

Validation

The section section supports the validation method for XHTML and HTML formats. You can completely skip this validation step to improve performance.

  • C# Version: section.Body.IsValidXHTML(htmlContent, XHTMLValidationType.Transitional, out error)
  • VB Version: section.Body.IsValidXHTML(htmlContent, XHTMLValidationType.Transitional, [error])

Convertion

Converting between these two formats is extremely simple. All you need to do is to use the InsertXHTML method built into the Section class to quickly insert HTML content converted into Word format into the newly created Word document.

  • C# Version: section.Body.InsertXHTML(htmlContent);
  • VB Version: section.Body.InsertXHTML(htmlContent)

You can then return the path to the location where you want to save the file. Do not forget to use our Save and Close methods :).

  • C# Version:
        string fileName = Path.Combine(OutputDir, this.GetType().Name + "_" + Guid.NewGuid().ToString() + GetExtension(SaveAsFormat));
        // Save the document.
        document.Save(fileName, SaveAsFormat);

        // Close the document.
        document.Close();
  • VB Version:
        string fileName = Path.Combine(OutputDir, this.GetType().Name + "_" + Guid.NewGuid().ToString() + GetExtension(SaveAsFormat));
        // Save the document.
        document.Save(fileName, SaveAsFormat);

        // Close the document.
        document.Close();

Setup some utility functions

  • C# Version:
        static string GetWebContent(string url, out string content)
        {
            try
            {
                WebClient client = new WebClient();
                content = client.DownloadString(url);

                return null;
            }
            catch (Exception ex)
            {
                content = null;
                return ex.Message;
            }
        }
        static string GetFileContent(string fileName, out string content)
        {
            try
            {
                StreamReader reader = new StreamReader(fileName);

                content = reader.ReadToEnd();

                return null;
            }
            catch (Exception ex)
            {
                content = null;
                return ex.Message;
            }
        }
  • VB Version:
        Private Shared Function GetWebContent(ByVal url As String, ByRef content As String) As String
            Try
                Dim client As New WebClient()
                content = client.DownloadString(url)

                Return Nothing
            Catch ex As Exception
                content = Nothing
                Return ex.Message
            End Try
        End Function

        Private Shared Function GetFileContent(ByVal fileName As String, ByRef content As String) As String
            Try
                Dim reader As New StreamReader(fileName)

                content = reader.ReadToEnd()

                Return Nothing
            Catch ex As Exception
                content = Nothing
                Return ex.Message
            End Try
        End Function

The full source code of this example is available in our Word package. A live demo for HTML to DOC is also available on our site. If you also need Word functionality, check out our Word online demos.

45-Day Money Back Guarantee

We will refund your full money in 45 days
if you are not satisfied with our products

Buy Now
You have successfully subcribed to our mailing list.
Dont miss out Get update on new articles and other opportunities Subscribe