Questions tagged [text-database]

18 questions
18
votes
2 answers

Where can I find a huge amount of text files?

Possible Duplicate: Looking for dataset to test FULLTEXT style searches on I am recently on to a project of Data Mining, for which I need 100 GB of plain text for testing. I am tired of searching the net the whole day. Someone please help me out…
Sri
  • 568
  • 1
  • 6
  • 18
4
votes
4 answers

Android: Is it more efficient to use a text file or an XML file to store static data

I have some reference data in a text file (~5MB) that I want to use with might android application. The file is of the format: 1|a|This is line 1a 1|b|This is line 1b 2|a|This is line 2a 2|b|This is line 2b 2|c|This is line 2c What I want to know…
Tawani
  • 11,067
  • 20
  • 82
  • 106
2
votes
1 answer

Create a simple Lua .txt Database with write() and read()

im trying to create a simple 2 functions text-file "database" with LUA. All i need is 2 Functions. my db should look like that: varName;varValue JohnAge;18 JohnCity;Munich LarissaAge;21 LarissaCity;Berlin In fact im not stuck…
Sinfox
  • 21
  • 1
2
votes
2 answers

How would the conversion of a custom CMS using a text-file-based database to Drupal be tackled?

Just today I've started using Drupal for a site I'm designing/developing. For my own site http://jwm-art.net I wrote a user-unfriendly CMS in PHP. My brief experience with Drupal is making me want to convert from the CMS I wrote. A CMS whose sole…
James Morris
  • 4,867
  • 3
  • 32
  • 51
1
vote
2 answers

R - Finding duplicates of words that are in a reversed order

I have a data.table with a column comprising occupational title names. I want to find out repeated occupations but that are written in a reverse order (e.g. writer advertising and advertising writer). Here is a simplified version of my data and the…
Miguel
  • 13
  • 2
1
vote
1 answer

How are vector operations performed on 20newsgroups_vectorized data set?

When I fetch 20newsgroups_vectorized data by newsgroups = fetch_20newsgroups_vectorized(subset='all') labels = newsgroups.target_names target = newsgroups.target target = pd.DataFrame([labels[i] for i in target], columns=['label']) data =…
Rosa
  • 155
  • 10
1
vote
1 answer

Converting a table of fixed width in text format into dataframe/excel/csv

I have some data in txt format with 38 columns which looks like this: With the exception of the header row, most of the rows have missing values. I want to convert this table into an array/ dataframe/ excel. But it is not coming as it looks in…
Manojk07
  • 337
  • 2
  • 10
1
vote
4 answers

Product Analytical data in TXT files (using YAML)

I am currently developing an ecommerce software using PHP/MySQL for a big company. There are two options for me to get some specificed data: DB (for getting huge data, such as PRODUCTS, CATEGORIES, ORDERS, etc.) TXT (using YAML -for getting…
kuzey beytar
  • 3,076
  • 6
  • 37
  • 46
1
vote
1 answer

HAML If defined? statement evaluating whether entry exists in Middleman data file

I have a Middleman data file data/testimonials.yaml: tom: short: Tom short alt: Tom alt (this should be shown) name: Thomas jeff: short: Jeff short alt: Jeff alt (this should be shown) name: Jeffrey joel: short: Joel short (he…
Rafal
  • 864
  • 10
  • 21
1
vote
0 answers

mySQL, Image and text in one row. How to separate them?

I have a blog(wordpress), and a website. I would like to connect my site to the blog. I mean that when I add a post to my blog, it(only title and image) appears on my website. There is one problem. When i add a post with a image, it appears in one…
Paweł Stanecki
  • 454
  • 2
  • 10
  • 24
0
votes
0 answers

How to make sentences from a dictionary of lists into plain text to apply NLTK

I am quite a noob on Python and everything. I am trying to use some NLTK for my dissertation on Applied Linguistics. But something keeps preventing the nltk tools to work on the dataset. I've tried some codes in the copy+paste+modify style. But had…
0
votes
1 answer

Clustering data after reducing dimension with autoencoder

My goal is to identify clusters in my dataset that containe around 10 categoricals and/or numericals columns and 3 textual description columns. After a few researchs, i thought about a 3 steps process: pre-processing my data (normalize my 10…
0
votes
0 answers

If statement is only checking that last iteration in while loop PHP

I'm familiar with programming in general but what I'm facing is a bit weird. The if statement is only working on the last iteration: Output: IN IN OUT OUT OUT OUT IN OUT IN BIN BOUT OUT OUT OUT OUT However if I add an if statement it only performs…
Spectre
  • 47
  • 8
0
votes
1 answer

Regex Python multiple dependencies for dates

I have unstructured data where I have to extract BP values and the dates(having different formats) as shown below. Right now I have a regex function to extract Bp values and the dates followed by BP values. I have a specific case as highlighted in…
user11870599
0
votes
1 answer

problem in reading the images of mjsynth dataset

recently I am trying to train a text recognition network. I tried to start the training by feeding the mjsynth dataset to network. However, there seems to be some images in the dataset which are blank. So, while training, if I directly feed the data…
1
2