Release of new PDF standard: PDF 2.0
The new PDF standard ISO 32000-2:2017 (PDF 2.0) has been published online! This standard specifies a digital form for representing electronic documents to enable users to exchange and view electronic documents independent of the environment in which they were created or the environment in which they are viewed or printed.
It is intended for developers of :
software that creates PDF files (PDF writers),
software that reads existing PDF files and (usually) interprets their contents for display (PDF readers),
software that reads and displays PDF content and interacts with the computer users to possibly modify and save the PDF file (interactive PDF processors), and
PDF products that read and/or write PDF files for a variety of other purposes (PDF processors).
ISO 32000-2:2017 does not specify the following:
specific processes for converting paper or electronic documents to the PDF file format;
specific technical design, user interface implementation, or operational details of rendering;
specific physical methods of storing these documents such as media and storage conditions;
methods for validating the conformance of PDF files or PDF processors;
required computer hardware and/or operating system.
pdfDebug: A look at how it works
pdfDebug is an add on component that is available for iText 7. Its basic function is to allow a programmer to see inside of a PDF while it is being created. This allows for advanced debugging on programs that use iText to create or manipulate PDFs.
Interested? Let’s take a look at how it works with an example.
To start, we created a simple program whose goal is to create a pdf that has four pages, with one phrase on each page. Page 1: Hello World, Page 2: Hello People, Page 3: Hello Everyone. In addition, we want the headers Item 1, Item 2, or Item 3 depending on the page which it appears on and the phrase Good bye on the last page. Below is the code that attempts to achieve this goal.
Once we run this code however, we find that the headers are not present in the pdf that is created.
To try and rectify this problem we can start debugging by clicking the debug button.
Great! Some information. But, unfortunately while there are a number of things to look at, there is no simple way to see what has gone wrong with the PDF itself. Looking at the PDF object does not yield much information that is useful to us in this scenario. This is where pdfDebug comes in. To add it to our workspace is a very simple operation, it can be downloaded right from the Eclipse Marketplace here. To install the plugin the install button can be dragged into the Eclipse workspace (Ta Dah!). A dialog will appear as shown below. Simply select pdfDebug and follow the prompts to install it.
After installing the plugin there is one more change that must be made to be able to use pdfDebug. The pdf writer must be set to debug mode. This can be done with the code change in the screenshot below.
Once we have this installed we can try debugging again and see what more information we are able to access now that we have pdfDebug installed. If we click on the PDF Document variable when we are debugging we are able to see the structure of the pdf.
Now this gives you some more information. With this new view we can see each piece of content as it is added to the PDF. Hello World is added to the PDF and we can see the PDF syntax that describes it inside of the PDF.
/F1 describes the font that it is added to the PDF in. /F1 is not universal but depends on how /F1 is defined inside of this particular PDF. We can see how it is defined by looking at the resources as shown here:
The second line of description for Hello World tells us where in the PDF it will appear. The coordinates are described with the x coordinate being first and the y coordinate being second. If we continue to step through the program that we have created, we can see that Item 1 is added to the PDF as well with a very similar description. If we look at the coordinates given to for Item 1 however, we can see that there is a problem with them. We see that the coordinates are 36 and 862.98. If we look at the Media Box for the page, which is the coordinates in the PDF that are shown, it has coordinates of 0,0,595,842.
means that it starts at 0 for x and y and ends at 550 x and 860 y. The problem we can guess is that the coordinates for Item 1 are outside the media box for the PDF. That means that it is a part of the PDF, but it is outside what is visible so it cannot be seen in the final PDF. We can now try to modify our code to see if the supposition above is correct. Below is a modification of the code which puts the coordinates of Item 1 inside of the media box.
Once we run the modified code we can again see that Item 1 and all subsequent items are present in the PDF.
Lastly, if we debug one more time to look at the other pages of the PDF as it is being created we can see that as things change in the PDF structure, they change color in the same way that they do in the Eclipse debugger.
Once you are finished debugging your PDF, it will render as expected: (Page 1: Hello World and header, Page 2: Hello people and header 2, Page 3: Hello everyone, header 3 and Good bye.)
As you can see, the ability to see the content structure of the PDF makes it much easier to debug not only initial errors, but other changes in the document from your changes pdfDebug differs from rups and other debugging information because it allows you to both debug existing PDFs and PDFs while they are in creation. Using iText allows for PDFs to be created and manipulated easily, but by this abstraction comes with the downside that the insides of the PDF are not easily understood. By using pdfDebug developers are able to see into their PDFs in a simple way which allows developers to leverage iText for their PDF needs and understand the details when something goes wrong in development
Try if yourself! Download pdfDebug on the Eclipse marketplace here.
You will need iText 7 to test pdfDebug, either an OpenSource iText 7 Community Version or a free trial that you can download here.
We have tutorials for you as well, check our website for help getting started!
Not an Eclipse user? Tell us what IDE we should launch next! Vote here.
iText founder talks open source, PDF and IP at OpenSourceForU
In a long interview with OpenSourceForU, India's prime press portal on all matters open source, the magazine sat down with iText founder and strategist Bruno Lowagie. In the interview, he talks about the past and future of iText and PDF and shares his wisdom about monetizing open source projects in the smartest way. You can read the entire interview by clicking through here, but here's an excerpt of Bruno's words:
"India can play a vital role in expanding the open source world by allowing this vast resource of talented people to contribute to open source projects. When you write closed source software, you can’t show the world how talented you are. Open source software is very transparent; everyone can assess the quality of your code. Ultimately, this should improve the quality of code in general".
India continues to be a very important region for iText, so obviously we are pleased with the attention for our position in India's open source community!
What is the size limit of pdf file?
The PDF Standard and Drupal
Drupal website developer and integrator Pronovix recently posted an interesting insight on using PDF technology. The full post, written by Pronovix CEO Kristof Van Tomme, is well worth reading, but here's the gist of it:
In the web community, PDF has become synonym for a range of accessibility bad practices. Some people even think that we would all be better off if PDF would finally die, just like Flash and Internet Explorer. As a result PDF is not very sexy in the Drupal and wider PHP community and this has negatively impacted our tooling.
This is a shame: when properly implemented, the PDF standard doesn’t need to suffer from the accessibility issues that a lot of online PDF documents are plagued with. PDF also holds a unique position in the digital world, it is a widely accepted standard that enables a range of applications for which there are no real alternatives.
Van Tomme goes on to list interoperability, open data and decentralization as huge benefits to the PDF format - when used correctly. He calls on the Drupal community to be more ambitious with regard to correct PDF implementation and cites iText as a star example of better PDF libraries for the Java community.
The post is the first in a projected multi-part series, and one we definitely recommend our readers to follow!
Do you know how many PDF documents exist in the world?
Adobe’s VP Engineering for Document Cloud, Phil Ydens, estimates there may be up to 2.5 trillion PDF documents in the world. This number is rising every day because PDF has been embraced by businesses, governments and individual users alike as a platform-agnostic way of passing and sending information that won’t be skewed or altered. However, paradoxically speaking, the number of PDF documents could be much higher if there was more awareness about the versatility of the format.
Not only is PDF gearing up for a world with more dynamic interactions between file formats, it can already be used today for digital signatures, authentication, encryption and invoicing. Although a business cliché is that the paperless office is always a few years away, PDF can truly and significantly reduce the paper stack.
So if we take a moment to be astounded at Phil Ydens’ estimate of 2.5 trillion PDF documents in the world, we should also take a moment to expect these numbers to rise once the many uses of PDF become more mainstream.
What is the size limit of pdf file?
Using PDF to solve healthcare's unstructured data problem
If there’s one major challenge to single out in healthcare IT today, it would be leveraging the growth and usage of big data. While consumer IT made big advances in the past decade to get a handle of data by marking up content, indexing it, and annotating it for use, enterprise, and healthcare IT in particular, still need to catch up on making data actionable.
A typical healthcare office handles tens of thousands of documents for patient records, legal, finance, billing processes. In pharma and biotech, a typical FDA drug review process, involves multiple stages of trials, testing, applications, marketing and manufacturing for the new drug – all requiring a mind-blowing amount of paperwork. In all these cases, either the collected data is not timely or relevant, or it doesn’t present enough opportunity to easily access, archive for the future or comply with legal standards.
This article provides insights into how using the Portable Document Format (PDF) and accompanying tools within healthcare organizations can be a powerful way to help solve the unstructured data challenge, speed up processes, and reduce the costs for document handling.
We will explain why PDF, with its ability to contain data structure and interactivity, is the perfect document format for meeting the archiving, accessibility and compliance requirements of the healthcare industry. We will also examine the building blocks of a solution that helps create such compliant PDF documents, and deep dive into the ways to organize and structure PDFs.
20 years of PDF: The Acrobat museum
Is it 20 years already? It's hard to believe that PDF 1.0 was released in 1993, but it's true. Everyone who's important in the world of PDF was invited to Portable Document Format's birthday party this year, either in Königswinter or in Seattle (and some of us went to both).
But did you know there was also an Adobe Acrobat museum?
You can find it in one of the buildings of Adobe's HQ in San Jose:
This is the place where you'll find the Camelot Paper by John Warnock:
Take a close look at the posters on the wall: these were the first news paper advertisements, announcing Adobe Acrobat.
Notice that the Budget Reduction Act of 1993 is mentioned. Just like the Budget Sequestration of 2013 made many companies start using iText as a less expensive solution for specific PDF solutions, companies started using Acrobat to solve document problems and save money by doing so.
We were in luck. Bob Wulff, SVP at Adobe, who has been working on the Acrobat project for more than 20 years, gave us a personal tour of the museum.
In the Acrobat museum, the history of Acrobat comes alive in the form of a display of Adobe products and merchandising: