Questions tagged [text-database]
18 questions
18
votes
2 answers
Where can I find a huge amount of text files?
Possible Duplicate:
Looking for dataset to test FULLTEXT style searches on
I am recently on to a project of Data Mining, for which I need 100 GB of plain text for testing. I am tired of searching the net the whole day. Someone please help me out…

Sri
- 568
- 1
- 6
- 18
4
votes
4 answers
Android: Is it more efficient to use a text file or an XML file to store static data
I have some reference data in a text file (~5MB) that I want to use with might android application.
The file is of the format:
1|a|This is line 1a
1|b|This is line 1b
2|a|This is line 2a
2|b|This is line 2b
2|c|This is line 2c
What I want to know…

Tawani
- 11,067
- 20
- 82
- 106
2
votes
1 answer
Create a simple Lua .txt Database with write() and read()
im trying to create a simple 2 functions text-file "database" with LUA. All i need is 2 Functions.
my db should look like that:
varName;varValue
JohnAge;18
JohnCity;Munich
LarissaAge;21
LarissaCity;Berlin
In fact im not stuck…

Sinfox
- 21
- 1
2
votes
2 answers
How would the conversion of a custom CMS using a text-file-based database to Drupal be tackled?
Just today I've started using Drupal for a site I'm designing/developing. For my own site http://jwm-art.net I wrote a user-unfriendly CMS in PHP. My brief experience with Drupal is making me want to convert from the CMS I wrote. A CMS whose sole…

James Morris
- 4,867
- 3
- 32
- 51
1
vote
2 answers
R - Finding duplicates of words that are in a reversed order
I have a data.table with a column comprising occupational title names. I want to find out repeated occupations but that are written in a reverse order (e.g. writer advertising and advertising writer).
Here is a simplified version of my data and the…

Miguel
- 13
- 2
1
vote
1 answer
How are vector operations performed on 20newsgroups_vectorized data set?
When I fetch 20newsgroups_vectorized data by
newsgroups = fetch_20newsgroups_vectorized(subset='all')
labels = newsgroups.target_names
target = newsgroups.target
target = pd.DataFrame([labels[i] for i in target], columns=['label'])
data =…

Rosa
- 155
- 10
1
vote
1 answer
Converting a table of fixed width in text format into dataframe/excel/csv
I have some data in txt format with 38 columns which looks like this:
With the exception of the header row, most of the rows have missing values. I want to convert this table into an array/ dataframe/ excel. But it is not coming as it looks in…

Manojk07
- 337
- 2
- 10
1
vote
4 answers
Product Analytical data in TXT files (using YAML)
I am currently developing an ecommerce software using PHP/MySQL for a big company. There are two options for me to get some specificed data:
DB (for getting huge data, such as PRODUCTS, CATEGORIES, ORDERS, etc.)
TXT (using YAML -for getting…

kuzey beytar
- 3,076
- 6
- 37
- 46
1
vote
1 answer
HAML If defined? statement evaluating whether entry exists in Middleman data file
I have a Middleman data file data/testimonials.yaml:
tom:
short: Tom short
alt: Tom alt (this should be shown)
name: Thomas
jeff:
short: Jeff short
alt: Jeff alt (this should be shown)
name: Jeffrey
joel:
short: Joel short (he…

Rafal
- 864
- 10
- 21
1
vote
0 answers
mySQL, Image and text in one row. How to separate them?
I have a blog(wordpress), and a website. I would like to connect my site to the blog. I mean that when I add a post to my blog, it(only title and image) appears on my website.
There is one problem. When i add a post with a image, it appears in one…

Paweł Stanecki
- 454
- 2
- 10
- 24
0
votes
0 answers
How to make sentences from a dictionary of lists into plain text to apply NLTK
I am quite a noob on Python and everything.
I am trying to use some NLTK for my dissertation on Applied Linguistics. But something keeps preventing the nltk tools to work on the dataset.
I've tried some codes in the copy+paste+modify style. But had…

theflteacher
- 41
- 5
0
votes
1 answer
Clustering data after reducing dimension with autoencoder
My goal is to identify clusters in my dataset that containe around 10 categoricals and/or numericals columns and 3 textual description columns.
After a few researchs, i thought about a 3 steps process:
pre-processing my data (normalize my 10…

M2dlgle
- 1
- 1
0
votes
0 answers
If statement is only checking that last iteration in while loop PHP
I'm familiar with programming in general but what I'm facing is a bit weird. The if statement is only working on the last iteration:
Output: IN IN OUT OUT OUT OUT IN OUT IN BIN BOUT OUT OUT OUT OUT
However if I add an if statement it only performs…

Spectre
- 47
- 8
0
votes
1 answer
Regex Python multiple dependencies for dates
I have unstructured data where I have to extract BP values and the dates(having different formats) as shown below. Right now I have a regex function to extract Bp values and the dates followed by BP values.
I have a specific case as highlighted in…
user11870599
0
votes
1 answer
problem in reading the images of mjsynth dataset
recently I am trying to train a text recognition network. I tried to start the training by feeding the mjsynth dataset to network. However, there seems to be some images in the dataset which are blank. So, while training, if I directly feed the data…

jd95
- 404
- 6
- 14