News
IBM to offer mobile security as a service IBM will start delivering mobile security-as-a-service (MaaS) through its own cloud infrastructure in India, a move that is expected to better address in-country data requirements... IBM Spectrum Computing: IBM has enlarged its portfolio of software-defined infrastructure solutions with cognitive features for workload management.  * LiFi internet technology has been introduced, the new type of wireless internet connection that gives 100 times faster than traditional WiFi -- Invented by Professor Harald Haas from the University of Edinburgh. * Sci. Rachid Yazami has developed a smart chip that charges smartphones in less than 10 minutes. BenQ has launched BlueCore projector - Consumer electronics major BenQ has launched its first BlueCore laser light source projector. For those unaware, devices with BlueCore laser technology have a high contrast output of 80000:1 with an extended lamp life and efficiency. *** 
  Dec 21 2014 7:53AM     Micheal
  0 Comments    11951 Views  
Here Micheal provided a short tutorial with example, how to Read PDF and Convert to Stream using C#/VB.
For that, we have to use a DLL called iTextSharp. Click Here to download iTextSharp DLL file.
Using iTextSharp DLL, we can read the PDF text in efficient manner.
Adding dll to the Project
Steps: Image given below show Add Reference option
Steps:
  1. Right click the project
  2. Select Add Reference from the options
  3. From the Add Reference PopUp window, Select Browse tab and Select iTextSharp dll file
  4. then, Click Ok. The iTextSharp dll file can be loaded to the project
Namespaces
You will need to import the following namespace.
C#

using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using System.IO;

VB

Imports iTextSharp.text.pdf
Imports iTextSharp.text.pdf.parser
Imports System.IO

In Page load Event, ReadPdfFile() method is called with parameter file path.
C#

protected void Page_Load(object sender, EventArgs e)
 {

            string PdfData = ReadPdfFile(@"C:\Test.pdf");

            //CONVERT STRING TO BYTE ARRAY
            byte[] pdfdata = ConvertStringToByte(PdfData);  
          
            //CONVERT BYTE ARRAY TO STREAM
            Stream stream = new MemoryStream(pdfdata);

            //VICE VERSA ** Stream To Text **
            if (stream != null)
            {
                // STREAM TO BYTE ARRAY
                Stream InputStream = stream;
                byte[] result;
                using (var streamReader = new MemoryStream())
                {
                    InputStream.CopyTo(streamReader);
                    result = streamReader.ToArray();
                }
                //BYTE ARRAY TO STRING
                string strPdfText = ConvertByteArrayToString(result);
            }
 }

VB

Protected  Sub Page_Load(ByVal sender As ObjectByVal e As EventArgs)
 
            Dim PdfData As String =  ReadPdfFile("C:\Test.pdf"
 
            'CONVERT STRING TO BYTE ARRAY
            Dim pdfdata() As Byte =  ConvertStringToByte(PdfData) 
 
            'CONVERT BYTE ARRAY TO STREAM
            Dim stream  Stream =  New (pdfdata) 
 
            'VICE VERSA ** Stream To Text **
            If Not stream Is Nothing Then
                ' STREAM TO BYTE ARRAY
                Dim InputStream As Stream =  stream 
                Dim result() As Byte
                Imports ( streamReader = New ())
                {
                    InputStream.CopyTo(streamReader)
                    result = streamReader.ToArray()
                }
                'BYTE ARRAY TO STRING
                Dim strPdfText As String =  ConvertByteArrayToString(result) 
            End If
End Sub

Code for Reading PDF file
Below is the sample code for Reading PDF file in ASP.Net C#/VB.
ReadPdfFile() method takes the parameter, file name and Reads the content finally PDF Document text will return as String.
C#

public string ReadPdfFile(string fileName)
{
            string currentText = "";
            StringBuilder PDFText = new StringBuilder();
            PdfReader pdfReader = new PdfReader(fileName);

            for (int page = 1; page <= pdfReader.NumberOfPages; page++)
            {
                ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

                currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(currentText)));
                PDFText = PDFText.Append(currentText);
            }
            pdfReader.Close();
            return PDFText.ToString();
}

VB

Public Function ReadPdfFile(ByVal fileName As StringAs String

            Dim currentText As String =  "" 
            Dim PDFText  StringBuilder =  New () 
            Dim pdfReader  PdfReader =  New (fileName) 
 
            Dim page As Integer
            For  page = 1 To  pdfReader.NumberOfPages Step  page + 1
                Dim strategy  ITextExtractionStrategy =  New () 
                currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy)
 
                currentText = Encoding.UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.UTF8.GetBytes(currentText)))
                PDFText = PDFText.Append(currentText)
            Next
            pdfReader.Close()
            Return PDFText.ToString()

End Function

The ReadPdfFile() method will read more than one pages in PDF document.
Method for Converting Byte[] to string and Vice Versa
C#

//Method to Convert Byte[] to string
private static string ConvertByteArrayToString(Byte[] ByteOutput)
{
            string StringOutput = System.Text.Encoding.UTF8.GetString(ByteOutput);
            return StringOutput;
}


//Method to Convert String to Byte[]
public static byte[] ConvertStringToByte(string Input)
{
            return System.Text.Encoding.UTF8.GetBytes(Input);
}

VB


'Method to Convert Byte[] to string
Private Shared Function ConvertByteArrayToString(ByVal ByteOutput() As ByteAs String
            Dim StringOutput As String =  System.Text.Encoding.UTF8.GetString(ByteOutput) 
            Return StringOutput
End Function
 
 
'Method to Convert String to Byte[]
Public static Byte() ConvertStringToByte(String Input)

            Return System.Text.Encoding.UTF8.GetBytes(Input)

I hope this page will helps to read PDF Document text and convert to Stream and vise versa in ASP.Net application. Thanks.
BackToTop
Comments



 
Search
Recent Posts
Create Amazon ElasticCache Using Memcached in CSharp
Oct 29 2018 12:09PM Posted By Amose
Get Google Map Lat Lng (Geo Point) By Pincode in C#
Oct 28 2018 12:09PM Posted By Pranav
Google URL Shortener in C#
Oct 20 2018 12:09PM Posted By Sanjay
Bind Gridview from CSV file in Asp.Net C#
Oct 5 2018 12:09PM Posted By Michael
Call WebService method from jQuery in every 1 minute
Sep 26 2018 12:09PM Posted By John
Gridview custom CSS in ASP.Net
Sep 14 2018 12:09PM Posted By Micheal Ryan
Read excel file and bind to Gridview in C#
Sep 10 2018 12:09PM Posted By Micheal
Tags
Follow us on Facebook
Follow us on Google +
Recent post in your Email inbox.
Enter your email address: