Research ArticleOpen Access

Quantitative Analysis of Research Trends on α-Lipoic Acid by Text Mining

Kobayashi Y* Ito R and Saito K

Department of Analytical Chemistry, Faculty of Pharmaceutical Sciences, Hoshi University, Shinagawa-Ku, Tokyo, Japan

*Corresponding Author: 
Kobayashi Y
  Department of Analytical Chemistry, Faculty of Pharmaceutical Sciences
  Hoshi University
  2-4-41 Ebara, Shinagawa-Ku, Tokyo 142-8501
  Japan
  E-mail: is19a01@hoshi.ac.jp

Citation: Kobayashi Y, Ito R, Saito K (2019) Quantitative Analysis of Research Trends on α-Lipoic Acid by Text Mining. J Nutr Diet Suppl 3(1): 105

Abstract

In order to survey the research trends involving α-lipoic acid, which is a common dietary supplement, quantitative analysis of research titles related to α-lipoic acid listed in Google Scholar and PubMed was carried out by text mining. We gathered document titles containing α-lipoic acid from 26,370 and 5,496 articles from Google Scholar and PubMed, respectively, and constructed an analytical database for text mining. KH Coder (Windows version) was used to carry out the text mining analysis. We analyzed keyword transitions by time series for frequently occurring words obtained by morphological analysis. In addition, co-occurrence network analysis and cluster analysis were carried out, and analysis was conducted on the relationship between the research area and frequently occurring words. Between 1940 and 1959, there were many words related to basic research. Since the 2000s, words in the therapy area appeared most frequently. As a result of the co-occurrence network analysis, the word groups of "oxidative stress", "antioxidant activity", "diabetic neuropathy", and "diabetes patient" were formed around the base words of "rat" and "effect". On the other hand, a group consisting of "mitochondrial function", "pyruvate dehydrogenase", and "gene expression" was formed mainly around the base word “human”. It was suggested that research on clinical applications was also carried out. In addition, as a result of hierarchical cluster analysis, the data were classified into six clusters, such as a cluster on basic research and clusters on clinical research; this clustering analysis showed the same tendency as the co-occurrence network. We classified each research area into six groups and determined the changes in research trends by year. As a result, we found that the basic research field has been decreasing year by year. In contrast, the clinical applied research field has been increasing. These results supported the results of co-occurrence network analysis, correspondence analysis, and hierarchical cluster analysis. Furthermore, as a result of keyword analysis related to enantiomers, the occurrences of the phrases "Optical" and "Enantiomeric" have been rapidly increasing since the 1980s. Co-occurrence network analysis was conducted on two categories, "Optical" and "Enantiomeric" which had many occurrences. Because of the analysis, groups related to "enantiomeric analysis", "enantiomeric synthesis", "pharmacology", and "pharmacokinetics" were formed and the studies on enantiomer of α-lipoic acid were considered to be centered in these areas. Especially, it was speculated that studies have mainly been conducted on the enantiomeric analysis of α-lipoic acid as a pharmaceutical. Furthermore, the phrase in another group of enantiomeric analysis suggested that studies of the bioavailability and pharmacokinetics of enantiomers have been conducted. Meanwhile, the word "drug" appeared in co-occurrence networks related to enantiomeric analysis, but phrases or words related to “dietary supplement” or “health foods” did not appear. Thus, it was considered that research on health foods related to enantiomers has not been conducted. These results suggest that the research relating to α-lipoic acid converges on the subjects related to basic research and clinical applications, and the clinical application domain has become the major research field for now. In addition, it is speculated that studies on enantiomeric analysis have increased, and studies on bioavailability and pharmacokinetics of enantiomers have also increased in particular.

Keywords: α-Lipoic acid; Dietary supplement; Text mining; Research trends; Google Scholar; PubMed

Introduction

In recent years, due to the heightened awareness of health issues, various strategies for health maintenance have been explored. The World Health Organization (WHO) and various national authorities are also promoting self-medication efforts [1]. Self-medication is a type of self-care in which citizens judge their injury or symptoms themselves and then use medical products to manage and treat their health by themselves [2]. Various health foods and dietary supplements are on sale for self-medication.

α-Lipoic acid (ALA), which is one common dietary supplement ingredient, was discovered as a growth factor of bacteria in the 1930s [3] and was isolated and purified from bovine liver in 1950 [4]. ALA has been used as a medicine in Germany and Japan since 1966. It has also been approved for use not only in pharmaceuticals but also as a food ingredient in Japan since 2004 and has been drawing attention as a dietary supplement in Japan [5]. ALA is a substance also called thioctic acid and is contained in the liver, heart, and kidneys of bovine or pigs [6]. It is also contained in vegetables such as spinach, tomatoes, and broccoli, but in very small amounts of approximately 1 mg/kg in animal-derived food. ALA has a role as a coenzyme for the pyruvate dehydrogenase complex (PDC) of the citric acid cycle, the main part of the metabolic pathway of aerobic organisms [7]. ALA is converted to dihydrolipoic acid (DHLA) which is reduced by enzymes in vivo [8] and is responsible for the direct and indirect reduction of oxidized glutathione and oxidized vitamin C and E radicals [9]. Because vitamin E can stop the chain lipid peroxidation reaction, ALA is said to be effective against arteriosclerosis [10]. ALA is attracting attention as a dietary supplement in Europe and Asia, where it is expected to have slimming and anti-aging effects and provide improvements for diabetes patients [11,12]. On the other hand, ALA has enantiomers and many dietary supplements contain chemically synthesized racemic ALA [13]. It was reported that the enantiomers of ALA exhibit different enzymatic activities for pyruvate dehydrogenase [14], anti-inflammatory activity in mice [9], and pharmacokinetics [15-18].

As described above, ALA was developed as a pharmaceutical product and is now used as a dietary supplement, and a wide range of fundamental and applied research has been actively conducted. Therefore, it is considered important to accurately and comprehensively understand and analyze ALA research trends when considering the application of ALA to a new research field or therapeutic area.

Biewenga et al. published a comprehensive survey of the biochemistry, toxicology, and pharmacology of ALA in a review article [19] and Bustamante et al. published a review article on the hepatic metabolism of ALA [20]. However, findings on the enantiomers of ALA are limited and no comprehensive research was found. In addition, these review articles were supported by excellent insights and the experiences of researchers, and it seems that the contents of a review will depend on the knowledge of the researchers. In the case of research conducted in a wide field, such as ALA, it is considered that there must be a limit to the research area to be investigated. It is difficult to objectively and quantitatively grasp the contents of such a broad research area by conventional procedures, but it is possible to conduct comprehensive and quantitative analysis by applying text mining methods.

Text mining is a method of quantitative analysis of latent factors in text data, such as the appearance frequency of words, correlation of co-occurrence, appearance tendency, time series, etc., by dividing text data into words and phrases [21]. Text mining techniques have been applied for the analysis of free descriptions of questionnaires [22], word analysis on social network services (SNS) such as Facebook and Twitter [23], discovery of the needs and problems of patients and staff in the field of nursing and nursing care [24], surveys on positive or negative opinions regarding the use of an anti-HPV vaccine by search engine [25], and other tasks. Goto et al. investigated the change in keyword appearance frequency over time in order to clarify the subject of the articles published in "An Official Journal of the Japan Association for Medical Informatics" using a text mining method [26]. In addition, Hachiken et al. conducted a quantitative analysis of the titles of the articles in "Japanese Journal of Pharmaceutical Health Care and Sciences" and "Journal of Japanese Society of Hospital Pharmacists" to survey the changes in pharmaceutical health care research conducted by researchers who are mainly pharmacists at hospitals and pharmacies [27].

Therefore, in this research, in order to clarify research trends and findings regarding ALA and the enantiomers of ALA, we analyzed the article titles related to ALA research listed in Google Scholar and PubMed using text mining methods. Then, we investigated the changes in the occurrence frequencies of words and performed quantitative analysis of ALA research trends.

Methods
Collection and creation of bibliographic title text data

We gathered the titles of articles listed in Google Scholar [28] and Pub Med [29] in the 80 years from 1940 to 2019. The search keyword was "lipoic acid" and patents and citations were excluded. In Google Scholar, the number of articles displayed per year is limited to 1,000, but the search results for "lipoic acid" exceeded 1,000 in the years since 2003. Therefore, 1,000 titles were collected for each year since 2003. For the entire period, 40,544 articles were displayed as search results, but 26,370 titles of articles were collected by excluding duplicates. We also collected 5,496 titles from PubMed. Then, we compiled the collected titles and created a database as a csv file for analysis.

Quantitative analysis of text data

We used a text mining method to analyze the database. Using the text mining software KH Coder (Windows version) [30], the number of nouns and adjectives appearing in the titles of the documents were counted for each word to obtain the number of appearances. In order to properly analyze related keywords, “α-lipoic acid”, “lipoic acid”, and “thioctic acid” were excluded from the extraction keywords at the time of analysis. In addition, when nouns were continuous, the number of occurrences was calculated as a compound word. Furthermore, we asked for the top 20 most frequent words and the top 20 most frequent compound words every 20 years. The Stanford POS Tagger [31,32] was used as a library of morphological analysis to extract words and compound words from the data.

Multivariate analysis

Co-occurrence network analysis, correspondence analysis, and hierarchical cluster analysis were performed on frequent words obtained by morphological analysis using the Stanford POS Tagger, and the relationship between research areas and frequent words was analyzed. A co-occurrence network is a network that shows the relationships between words used in a text and the relationship between words used in one document title. In this research, the words with an occurrence frequency of 400 times or more (Google Scholar) and 100 times or more (PubMed) were adopted by using “word selection” and were analyzed with a co-occurrence relation based on a Jaccard coefficient of 0.05 or more. The Jaccard coefficient is an index of similarity and is widely used to represent co-occurrence relationships between words.

Correspondence analysis is a method of illustrating features of data and visually grasping the relationships between words. Words with small biases (such as general phrases) are placed near the origin, and words with large biases (such as characteristic phrases) are placed far from the origin. Further, words that are closely related to each other are placed in the same direction with respect to the origin. Hierarchical cluster analysis is a method of grouping the most similar combinations in order (cluster) and can be used to represent the middle process as a hierarchy. Finally, a dendrogram is generated, and clusters of words can be identified visually.

Category classification of articles

When a specific word appeared in a document title, the document title was classified into a specific research category, and the changes in the number of documents belonging to each category of the research area over time were examined. The research categories were classified into seven categories: "Pharmacology", "Safety Assessment and Toxicology", "ADME", "Process, Chemical Engineering and Manufacturing", "Physical-chemical property and Formulation", "Clinical Trials" and "Miscellaneous" in accordance with the research classifications used in drug development listed by the American Physiological Society [33]. The coding rules for classifying each document title into each category were to select frequent words, frequent compound words, and words with high relevance (which show features), to calculate the appearance rate of each code, and to determine the changes every 20 years. The significance of residuals in the distribution of categories in each period was evaluated by a chi-squared test. The significance level was set at p<0.01.

Keyword analysis of enantiomers

We classified keywords related to enantiomers into categories and examined the changes in the keywords related to these categories with time. The keywords were classified into four categories: "Chiral", "Racemic", "Optical", and "Enantiomeric". The coding rule selected the related word including each representative keyword. We calculated the appearance rate for each category and determined the changes in occurrence rates every 20 years. In addition, we performed co-occurrence network analysis of these categories and investigated the relevance of the words.

Results and Discussion
Trends in the number of articles published in Google Scholar and PubMed

The change in the number of articles published in Google Scholar and PubMed is shown in Figure 1. The number of documents published between 1940 and 1959 was 674 and 185 for Google Scholar and PubMed, respectively, but the number of documents in the period 2000-2019 was 32,300 and 3,905. Although the number of documents has increased over the years, it has increased at an accelerating rate of approximately 5.9 times and 4.8 times compared to the period 1980-1999, especially after 2000.

Frequent words and frequent compound words

Among the titles analyzed here, 37,882 kinds of 417,606 words (Google Scholar) and 10,926 kinds of 92,922 words (PubMed) were included. In order to properly analyze related keywords other than ALA, “α-lipoic acid”, “lipoic acid”, and “thioctic acid”, these words were excluded from the extracted keywords at the time of analysis. In addition, re-extraction was performed excluding common words such as pronouns, conjunctions, prepositions, numbers, and units, such as “the”, “of”, “and”, “to”, “its”, and “in”.

As a result, 37,843 kinds of 375,222 words and 10,895 kinds of 79,471 words were extracted for analysis from Google Scholar and PubMed, respectively. The top 20 most frequently appearing words and frequently appearing compound words are shown in Table 1 and 2. Among the frequent words, "effect", "rat", "oxidative", "stress", "diabetic", and "antioxidant" were found. Among the frequently occurring complex words, “oxidative stress”, “protective effect”, “Escherichia coli”, “pyruvate dehydrogenase”, “fatty acid”, “diabetic neuropathy”, and “Alzheimer’s disease” were found, among others.

Google Scholar PubMed
No. Extracted word Count Extracted word Count
1 effect 2344 effect 809
2 rat 2090 rat 670
3 oxidative 1588 oxidative 371
4 stress 1484 antioxidant 325
5 diabetic 1091 stress 321
6 antioxidant 1062 dehydrogenase 300
7 role 1003 treatment 287
8 disease 979 diabetic 274
9 protein 946 activity 238
10 activity 848 human 237
11 dehydrogenase 846 disease 228
12 mitochondrial 840 mitochondrial 228
13 treatment 764 protein 223
14 human 752 role 216
15 EFFECTS 736 mouse 213
16 induce 722 complex 202
17 synthesis 722 induce 201
18 mouse 671 patient 188
19 neuropathy 623 antioxidant 165
20 patient 623 synthesis 162

Table 1: Frequent words in each search engine

Google Scholar PubMed
No. Extracted compound word Count Extracted compound word Count
1 oxidative stress 1110 oxidative stress 285
2 protective effect 404 protective effect 130
3 escherichia coli 391 escherichia coli 120
4 pyruvate dehydrogenase 359 dihydrolipoic acid 84
5 fatty acid 335 gold nanoparticles 58
6 diabetic neuropathy 228 lipid peroxidation 54
7 alzheimer's disease 223 diabetic neuropathy 48
8 diabetes mellitus 217 pyruvate dehydrogenase 48
9 diabetic rats 207 diabetes mellitus 47
10 primary biliary cirrhosis 196 pyruvate dehydrogenase complex 46
11 amino acid 190 primary biliary cirrhosis 45
12 reactive oxygen species 186 alzheimer's disease 44
13 nitric oxide 181 reactive oxygen species 44
14 lipid peroxidation 179 oxidative damage 43
15 insulin resistance 160 skeletal muscle 40
16 vitamin e 158 gene expression 37
17 oxidative damage 135 lipoamide dehydrogenase 36
18 burning mouth syndrome 134 fatty acids 34
19 purification 133 insulin resistance 34
20 skeletal muscle 122 burning mouth syndrome 33

Table 2: Frequent compound words in each search engine

The top 20 most frequent words for every 20-year period in Google Scholar are shown in Table 3. From 1940 to 1959, words such as "metabolism", "synthesis", "growth", and "enzyme" appeared frequently, but these words decreased gradually thereafter, and only "synthesis" (No. 20) was in the top 20 from 2000. On the other hand, "gene" appeared in 1980-1999, and clinically relevant words such as "diabetic", "diabetes", "antioxidant", and "neuropathy" were the most frequent in the period 2000-2019. General purpose words such as "effect" and "activity" appeared as superordinate words in all year groups.

1940-1959 1960-1979 1980-1999 2000-2019
No. Extracted word Count Extracted word Count Extracted word Count Extracted word Count
1 metabolism 39 dehydrogenase 164 dehydrogenase 412 effect 1927
2 synthesis 33 metabolism 132 effect 276 rat 1730
3 growth 30 effect 121 rat 271 oxidative 1438
4 effect 20 coli 111 complex 255 stress 1386
5 oxidation 19 ESCHERICHIA 104 pyruvate 237 diabetic 992
6 enzyme 18 pyruvate 91 protein 224 antioxidant 949
7 activity 16 complex 88 human 200 disease 830
8 compound 16 reaction 87 coli 187 role 805
9 factor 16 synthesis 81 Escherichia 186 mitochondrial 695
10 metabolic 15 rat 80 characterization 181 protein 680
11 system 15 activity 76 synthesis 181 treatment 675
12 streptococcus 14 growth 75 metabolism 165 induce 638
13 vitamin 14 enzyme 74 gene 152 activity 614
14 mechanism 13 mechanism 64 activity 142 EFFECTS 610
15 reaction 13 disulfide 61 role 141 mouse 596
16 requirement 12 liver 60 growth 138 neuropathy 564
17 COMPOUNDS 11 compound 57 system 135 patient 532
18 METABOLISM 11 property 57 primary 130 human 526
19 PYRUVATE 11 system 55 enzyme 127 diabetes 521
20 BIOCHEMISTRY 10 structure 54 disease 121 synthesis 427

Table 3: Frequent words in Google Scholar from 1940-2019

Co-occurrence network analysis

Co-occurrence network analysis was performed for words with 450 or more occurrences (Google Scholar) and 100 or more occurrences (PubMed), and the results are shown in Figure 2 and 3 as co-occurrence network diagrams. The co-occurrence network diagram of Google Scholar showed that research area groups were formed, such as "oxidative stress", consisting of the two words "oxidative" and "stress", as well as research area groups such as "rat, effects", "antioxidant activity", "diabetic neuropathy", "mitochondrial functions", "pyruvate dehydrogenase", "gene expression", and "characterization, synthesis".

Correspondence analysis

Correspondence analysis was performed for words with more than 450 occurrences in Google Scholar, and the results are shown in Figure 4. The numerical value represents the duration, and words related to that duration are distributed nearby. The words placed near the origin are words common to all duration groups, and these can be judged as non-characteristic words in order of their distances from the origin. Therefore, "effect", "rat", "role", "disease", and "mitochondrial" were concluded to be uncharacteristic words that do not correspond to any duration. In addition, words that were far from the origin in the direction of numerical values can be judged to be more relevant to their duration and can be said to be characteristic words of that duration. Therefore, "system", "mechanism", and "metabolism" were characteristic words in the period 1960-1979, and the terms such as "oxidative", "stress", "treatment", "diabetes" and "neuropathy" can be said to be characteristic words in the period 2000-2019.

Hierarchical cluster analysis

Hierarchical cluster analysis was performed for words with 450 or more occurrences in the Google Scholar data set, and the results are shown as a dendrogram in Figure 5, and the cluster classification is shown in Table 4. As a result, the data were classified into six clusters; a cluster on "pyruvate dehydrogenase" related to the metabolic pathway of ALA and a cluster on clinical applications such as "oxidative stress" and "diabetic neuropathy" appeared. In addition, clusters related to basic research including "mouse", "antioxidant", "metabolism", and "synthesis", and clusters related to diseases and treatments including "patients", "diseases", "therapy", and "diabetes" were shown to have similar tendencies to the co-occurrence networks. It was suggested that studies on basic research and clinical applications of compounds had been conducted to date.

cluster 1 cluster 2 cluster 3
word count word count word count
dehydrogenase 846 oxidative 1588 diabetic 1091
complex 541 stress 1484 treatment 764
pyruvate 474 effect 121 neuropathy 623
cluster 4 cluster 5 cluster 6
word count word count word count
effect 2344 antioxidant 1062 role 1003
rat 2090 protein 946 disease 979
protective 450 activity 848 mitochondrial 840
human 752 patient 623
EFFECTS 736 antioxidant 587

Table 4: Cluster classification and words appearing in Figure 4

Category classification of documents

In order to categorize each article, a coding rule file was created allowing for classification into seven categories, as shown in Table 5. The coding rules used to classify each word into each category selected frequent words that showed features, frequent complex words, and words with high relevance, calculated the appearance rate of each code, and then determined changes occurring every 20 years, as shown in Figure 6. In addition, in order to confirm the significance of residuals in the distribution of categories in each time period, a chi-square test was performed, and the significance level was p<0.01. The results of cross tabulation and residual analysis are shown in Table 6.

Codes Examples of terms used in coding rules
Pharmacology oxidative stress, pyruvate dehydrogenase, antioxidant activity
Toxicology toxicology, LD50, acute oral, ames, carcinogenicity, mutagenicity
ADME ADME, absorption, distribution, metabolism, excretion, metabolite
Chemistry chemistry, characterization, synthesis, manufacturing, impurity
Physicochemical physical chemical property, solubility, LogP, formulation, surfactant
Clinical Trials clinical trial, disease, patient, diabetic neuropathy
Miscellaneous protective, effect, model, analysis, activation

Table 5: Definition of codes of each category

Pharmacology Toxicology ADME Chemistry Physicochemical Clinical Trials Miscellaneous
1940-1959 ▽ 127(19.69%) ▽ 4(0.62%) ▲ 86(13.33%) ▲ 100(15.50%) 14(2.17%) ▽ 49(7.60%) ▽ 46(7.13%)
1960-1979 ▽ 760(31.34%) ▽ 49(2.02%) ▲ 291(12.00%) ▲ 386(15.92%) ▲ 84(3.46%) ▽ 203(8.37%) ▽ 253(10.43%)
1980-1999 ▲ 2068(40.73%) ▽ 198(3.90%) ▲ 417(8.21%) ▲ 875(17.23%) 133(2.62%) ▽ 969(19.09%) ▽ 826(16.27%)
2000-2019 ▲ 7450(40.88%) ▲ 1280(7.02%) ▽ 956(5.25%) ▽ 1672(9.18%) ▽ 411(2.26%) ▲ 6869(37.70%) ▲ 4720(25.90%)
Total 10405(39.46%) 1531(5.81%) 1750(6.64%) 3033(11.50%) 642(2.43%) 8090(30.68%) 5845(22.17%)
Chi-square value 191.383** 178.449** 236.478** 317.377** 14.198** 1471.749** 527.790**

Chi-square test: ** p<0.01
Residual analysis (** p<0.05): ▽: Statistically small, ▲: Statistically large
Table 6: Cross-tabulation table and result of Chi-squared test for each category

The group "Pharmacology" had the largest number of occurrences in seven categories in all time periods, but the rate of change between the period 1980-1999 and the period 2000-2019 remained stable. “ADME” showed a downward trend in the right shoulder, which was significantly lower at 5.25% in the period 2000-2019 (p<0.01). “Chemistry” remained almost constant until 1940-1999, but reached 9.18% in 2000-2019, which was a significantly lower value (p<0.01). “Clinical trials” tended to increase in all eras, but the growth rate from the period 1980-1999 to the period 2000-2019 was the largest, showing a significantly high value of 37.70% (p<0.01). These results suggest that the number of publications in basic research areas such as ADME and Chemistry is decreasing year by year, and clinical application research is increasing. These results support the results of co-occurrence network analysis, correspondence analysis, and hierarchical cluster analysis.

Keyword analysis of enantiomers

We classified keywords related to enantiomers into categories and examined the changes in these keywords. The data were classified into four categories: "Chiral", "Racemic", "Optical", and "Enantiomeric". The coding rule selected related words including each representative keyword (Table 7). The appearance rate for each category was calculated, and the changes in every 20-year period are shown in Figure 7. Only six words related to "Racemic" and "Optical" appeared between 1940 and 1959, and the total was less than 20 words until 1980. The words "Optical" and "Enantiomeric" had increased rapidly since the 1980s, and the numbers of these keywords appearing in 2000-2019 were more than 10 times greater than in 1960-1979. Therefore, it was suggested that attention was focused on enantiomers in recent years.

Codes Examples of terms used in coding rules
Chiral chiral, chirality, chiral-phase
Racemic racemic, racemization, deracemization
Optical optical, optic, optically
Enantiomeric enantiomer, enantioselective, enantioseparation, enantiomeric

Table 7: Definition of codes of each category related to enantiomers

Co-occurrence network analysis was performed for two categories, “Chiral” and “Enantiomeric”, which had many occurrences, and the results are shown as co-occurrence network diagrams in Figure 8 and 9, respectively. In the “Chiral” co-occurrence network, a group involved in the analysis of enantiomers consisting of “enantioseparation” and “enantiomeric chromatography” and a group related to asymmetric synthesis including “compound”, “chirality”, “production”, or “synthesis” were formed. A group of in vivo reactions and derivatization reactions including "biotransformation" and "derivatize agent" was also formed.

On the other hand, in the co-occurrence network diagram of "Enantiomeric", a group including enantiomeric analysis, asymmetric synthesis, and asymmetric reaction was formed, the same as in the co-occurrence network of "Chiral". In addition, terms related to pharmacology and pharmacokinetics appeared such as "bioavailability", "pyruvate dehydrogenase complex", "redox reaction", and "enzyme reaction" compared to the co-occurrence network of "Chiral".

In each of the "Chiral" and "Enantiomeric" co-occurrence networks, the group for enantiomeric analysis was the largest. In both co-occurrence networks, the words "chiral-phase", "gas-chromatography", "capillary", and "drug" appeared, focusing on "Chromatography" and "Enantiomeric". Therefore, it was thought that enantiomeric analysis by gas chromatography of ALA as medicine was performed. On the other hand, in the "Enantiomeric" co-occurrence network, "enantioseparation", "enantiomer-selective", and "pharmacokinetic" groups were formed in addition to the aforementioned groups, focusing on bioavailability. This suggests that the research on bioavailability and pharmacokinetics of enantiomers was conducted.

From these results, studies on enantiomers of ALA were mainly conducted involving enantiomeric analysis, asymmetric synthesis, pharmacology, and pharmacokinetics, and in particular, studies on enantiomeric analysis were mainly conducted. It was inferred that there were some studies of enantiomeric analysis of ALA as medicine. Furthermore, the formation of another group of enantiomeric analysis suggested that studies of the bioavailability and pharmacokinetics of enantiomers were also conducted. On the other hand, although the word “drug” appears in the co-occurrence network including analysis of these enantiomers, the words "dietary supplement" or “health food” did not appear, which suggests that research on health food was not conducted

Conclusion

In this study, we analyzed the time-dependent changes in the keywords found from Google Scholar and PubMed searches for publications that contain ALA in the title. It was found that the number of articles on ALA increased year by year, reaching 32,300 in the period from 2000 to 2019 in Google Scholar. From 1940 to 1959, words related to basic research such as "metabolism", "synthesis", "growth", and "enzyme" appeared frequently, and from the 2000s, words associated with clinical application areas such as "diabetic", "antioxidant", and "neuropathy" appeared, and "diabetes” was the most common. Common keywords such as "effect" and "activity" appeared frequently in all periods in Google Scholar.

Second, multivariate analysis, co-occurrence network analysis, and cluster analysis were performed to analyze the relationship between the research area and frequent words. As a result of co-occurrence network analysis, groups of "oxidative stress", "antioxidant activity", "diabetic neuropathy" and "mitochondrial function" were identified centering on the words "rat" and "effect". These results suggested that research related to efficacy and pharmacology in animal studies were conducted. On the other hand, it was suggested that a group consisting of "pyruvate dehydrogenase" and "gene expression” were formed starting from humans, and research on clinical applications was also conducted. Furthermore, as a result of cluster analysis, the data were classified into six clusters, such as a cluster for basic research and a cluster for clinical research, and showed similar tendencies to the co-occurrence network. Therefore, it was suggested that research had been conducted in the fields of research on clinical applications and basic research on compounds.

Next, coding rule files for each research area were created, classifying them into seven categories, and the changes over time were calculated. Pharmacology showed the largest occurrence in all year groups, but the rate of change for the periods 1980-1999 and 2000-2019 remained constant. ADME and Chemistry showed significantly lower occurrence rates in 2000-2019 (p<0.01). Clinical trials tended to increase in all years, but the growth rate from the period 1980-1999 to the period 2000-2019 was the largest, showing a significantly higher value of 37.70% (p<0.01). These results suggest that the number of publications in basic research areas such as ADME and Chemistry is decreasing year by year, while clinical applications research has increased. These results support the results of co-occurrence network analysis, correspondence analysis, and hierarchical cluster analysis.

Lastly, the keywords related to enantiomers were classified into categories, and the changes in the occurrence of keywords related to these categories were examined. The words "Optical" and "Enantiomeric" had increased rapidly since the 1980s. In 2000-2019, the total numbers of keywords were more than 10 times higher than in 1960-1979. Therefore, it was suggested that more attention has been focused on enantiomers in recent years. Co-occurrence network analysis was carried out for two categories with high frequency of occurrence, "Optical" and "Enantiomeric", and groups of "enantiomeric analysis", "asymmetric synthesis", "pharmacology", and "pharmacokinetics" were formed, and they were related to enantiomers of ALA. It was suggested that research was centered on these areas. In particular, research on enantiomeric analysis was mainly conducted, and it was speculated that there were many studies on enantiomeric analysis of ALA as a drug. Furthermore, another group of enantiomeric analysis suggested that research on the bioavailability and pharmacokinetics of enantiomers was also being conducted. On the other hand, although the word “drug” appeared in the co-occurrence network including analysis of these enantiomers, the words "dietary supplement" or “health food” did not appear, so it was inferred that research on health food was not conducted.

From these results, it was inferred that research subjects on α-lipoic acid converge on basic research and clinical application subjects, and were closely related to other keywords, and research on clinical application areas was currently the core. In addition, research on enantiomers is also increasing, and in particular, analysis of the bioavailability and pharmacokinetics of enantiomers was assumed to be increasing.

In conclusion, it is possible to clarify the research trends of α-lipoic acid by text mining, and it is expected to be helpful for increasing the knowledge of applied research on α-lipoic acid.

References
  1. 1. WHO (2000) Guideline for the regulatory assessment of Medicinal Products for use in self-medication. World Health Organization, Geneva, Switzerland.
  2. 2. Martins AP, Miranda Ada C, Mendes Z, Soares MA, Ferreira P, et al. (2002) Self-medication in a Portuguese urban population: a prevalence study. Pharmacoepidemiol Drug Saf 11: 409-14.
  3. 3. Snell EE, Strong FM, Peterson WH (1937) Growth factors for bacteria: Fractionation and properties of an accessory factor for lactic acid bacteria. Biochem J 31: 1789-99.
  4. 4. Reed LJ (2001) A trail of research from lipoic acid to α-keto acid dehydrogenase complexes. J Biol Chem 276: 38329-36.
  5. 5. Notification No. 0331009 of the Pharmaceutical and Food Safety Bureau (2004)
  6. 6. Lodge JK, Youn HD, Handelman GJ, Konishi T, Matsugo S, et al. (1997) Natural sources of lipoic acid: determination of lipoyllysine released from protease-digested tissues by high performance liquid chromatography incorporating electrochemical detection. J Appl Nutr 49: 3-11.
  7. 7. Perham RN (2000) Swinging arms and swinging domains in multifunctional enzymes: catalytic machines for multistep reactions. Annu Rev Biochem 69: 961-1004.
  8. 8. Jones W, Li X, Qu ZC, Perriott L, Whitesell RR, et al. (2002) Uptake, recycling, and antioxidant actions of α-lipoic acid in endothelial cells. Free Radic Biol Med 33: 83-93.
  9. 9. Fuchs J, Milbradt R (1994) Antioxidant inhibition of skin inflammation induced by reactive oxidants: evaluation of the redox couple dihydrolipoate/lipoate. Skin Pharmacol 7: 278-84.
  10. 10. Ying Z, Kherada N, Farrar B, Kampfrath T, Chung Y (2010) Lipoic acid effects on established atherosclerosis. Life Sci 86: 95-102.
  11. 11. Ziegler D, Nowak H, Kempler P, Vargha P, Low PA (2004) Treatment of symptomatic diabetic polyneuropathy with the antioxidant α‐lipoic acid: a meta-analysis. Diabet Med 21: 114-21.
  12. 12. Shay KP, Moreau RF, Smith EJ, Smith AR, Hagen TM (2009) Alpha-lipoic acid as a dietary supplement: molecular mechanisms and therapeutic potential. Biochim Biophys Acta (BBA)-General Subjects, 1790: 1149-60.
  13. 13. Kobayashi Y, Saito K, Iwasaki Y, Ito R, Nakazawa H (2012) Enantiomeric Determination of α-Lipoic Acid in Dietary Supplements by Liquid Chromatography/Mass Spectrometry. Bunseki Kagaku 61: 109-14.
  14. 14. Löffelhardt S, Bonaventura C, Locher M, Borbe HO, Bisswanger H (1995) Interaction of α-lipoic acid enantiomers and homologues with the enzyme components of the mammalian pyruvate dehydrogenase complex. Biochem Pharmacol 50: 637-46.
  15. 15. Hermann R, Niebch G, Borbe HO, Fieger-Büschges H, Ruus P, et al. (1996) Enantioselective pharmacokinetics and bioavailability of different racemic α-lipoic acid formulations in healthy volunteers. Eur J Pharm Sci 4: 167-74.
  16. 16. Gleiter CH, Schug BS, Hermann R, Elze M, Blume HH, et al. (1996) Influence of food intake on the bioavailability of thioctic acid enantiomers. Eur J Clin Pharmacol 50: 513-14.
  17. 17. Uchida R, Okamoto H, Ikuta N, Terao K, Hirota T (2015) Enantioselective pharmacokinetics of α-lipoic acid in rats. Int J Mol Sci 16: 22781-94.
  18. 18. Kobayashi Y, Ito R, Saito K (2019) Enantiomeric determination of α-lipoic acid in urine by LC/MS/MS. J Pharm Biomed Anal 166: 435-9.
  19. 19. Biewenga GP, Haenen GR, Bast A (1997) The pharmacology of the antioxidant lipoic acid. Gen Pharmacol 29: 315-31.
  20. 20. Bustamante J, Lodge JK, Marcocci L, Tritschler HJ, Packer L, et al. (1998) α-Lipoic acid in liver metabolism and disease. Free Radic Biol Med 24: 1023-39.
  21. 21. Tan AH (1999) Text mining: The state of the art and the challenges. In Proceedings of the PAKDD 1999. Workshop on Knowledge Disocovery from Advanced Databases 8: 65-70.
  22. 22. Yamanishi K, Li H (2002) Mining open answers in questionnaire data. IEEE Intell Syst 17: 58-63.
  23. 23. He W, Zha S, Li L (2013) Social media competitive analysis and text mining: A case study in the pizza industry. Int J Inform Manage 33: 464-72.
  24. 24. Washida M, Hattori K (2008) Effectiveness of text mining: Text mining applied, two cases of nursing research. Jpn J Nurs Res 41: 249-58.
  25. 25. Okuhara T, Ishikawa H, Okada M, Kato M, Kiuchi T (2018) Contents of Japanese pro-and anti-HPV vaccination websites: a text mining analysis. Patient education and counseling. 101: 406-13.
  26. 26. Goto S, Hachiken H, Takada M (2011) Quantitative Analysis of Japanese Journal Articles on Medical Pharmacy. Jpn J Pharm Health Care Sci 37: 21-30.
  27. 27. Hachiken H, Mastuoka A, Murai A, Kinoshita S, Takada M (2012) Quantitative analyses by text mining of journal articles on medical pharmacy. Jpn J Drug Infom 13: 152-9.
  28. 28. Google Scholar https://scholar.google.com/ (Accessed on 26th December 26, 2018)
  29. 29. PubMed https://www.ncbi.nlm.nih.gov/pubmed/ (Accessed on 26th December 26, 2018)
  30. 30. Higuchi K (2016) A two-step approach to quantitative content analysis: KH Coder tutorial using Anne of Green Gables (Part I). Ritsumeikan Social Science Review 52: 77-91.
  31. 31. Toutanova K, Manning CD (2000) Enriching the knowledge sources used in a maximum entropy part-of-speech tagger. In Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics. 13: 63-70.
  32. 32. Toutanova K, Klein D, Manning CD, Singer Y (2003) Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology. 1: 173-80.
  33. 33. [33] The Drug Discovery Process (PPT) The American Physiological Society (APS).

We welcome your research work...!!!Submit Manuscript