Text processing and preprocessing operate behind the scenes of any robust text analytics platform. SumUp improves on existing methods by merging a rigorous model testing bench with model-based and ad-hoc, experienced-based processing. This combination enables sentence-level analysis, an effective understanding of word choice, consistent performance across file types, and clever algorithms which lead to exceptionally fast processing speeds and better data representation.
Documents in a Nucleus dataset often contain additional information such as each document’s title, author, source, language, publication date, and other custom tags. Nucleus stores this information as metadata that you can use to select subsets of your datasets and apply any Core Analytics or Advanced Analytics capabilities to that subset, allowing very focused and nimble workflows. Already available through APIs, it will soon become available to WebApp users as well.
You can search for specific words that might exist within any document in your datasets. The search capability of Nucleus makes your queries more robust by automatically including alternatives that share the same root as your query. For instance: plural and singular, infinitive and conjugated verbs. This capability is available across all languages supported by Nucleus.
Most company files exist on local folders within each employee’s computer. Your organization needs to stay connected to these files but keeping the content up to date for corresponding analytics is cumbersome. Nucleus Folder Synchronizer takes the pain out of the equation by doing all the heavy lifting for its users through a fast and simple interface. As new documents are added to local folders, or existing documents are modified, the synchronizer sends the new content to Nucleus to update your analytics automatically. This lets you know the current state of a project without constantly pinging your team, allowing them to stay focused on the task at hand.
Your content remains in your control inside Nucleus. Create datasets as needed, delete them as desired, and remove specific documents from your datasets when you no longer consider them relevant. Keep only what is useful to you and your team for increasingly focused workflows.
Nucleus topic model is a pillar of its analytics capabilities. It relies on 10 years of research and 3 years of design optimization to capture the way people learn from their communications with one another: denoising and reorganizing information into a more synthetic form, without a-priori so that people adapt to situations as they arise. This flexible approach is precisely how Nucleus extracts and identifies topics. Each topic is composed of a list of keywords that are determined on a dataset and on a query basis, and accounts for the context in which words are used.
The Nucleus summarization function extracts relevant sentences and combines them into bullet points that resemble an executive summary. This summarization works both at the single-document level and across multiple documents. Nucleus summarizes the entire content and can also focus on specific subjects.
Sentiment analysis allows users to analyze and understand the opinions expressed by contributors on any given topic. Nucleus sentiment analysis considers both the polarity (positive or negative) and the intensity of those opinions on a scale from 0% (neutral vocabulary) to 100% (strongly connotated vocabulary). Nucleus sentiment analysis is both language-specific and domain-specific as professionals from different cultures don’t always have the same perception of a word. For instance, the words ‘household debt’ is a fairly neutral financial term in the US but has a negative connotation in China. The same applies to industry verticals. The word ‘soft’ is a positive description of textiles in fashion but is a negative description of an NFL player. Nucleus provides two domain-specific dictionaries (finance and general news), and enables its API users to provide their own dictionaries if they want to.
Consensus analysis evaluates the percentage of contributors that share a similar opinion on a given topic. This is particularly powerful in identifying and understanding situations where a few sources sway the overall perception through prolific writing or through a heavy preference for words of strong intensity. You can quickly use this information to better understand majority vs controversial opinions.
Measures the percentage of information that relates to a topic within a dataset. This metric goes beyond word count as it accounts for the context in which words are used and looks to mitigate duplicated content.
Historical analysis brings you time-contextual understanding of conversations so that you can make better decisions at the right moment. How have opinions on a topic evolved over the past quarter? Was a topic as important last month? When did it appear for the first time? This tool helps investment professionals better understand trends and is a game-changer in the world of content monitoring.
Determines the documents that have the greatest contribution to a given topic so that users can make better use of their reading and review time.
Data teams spend almost 30% of their time labeling data. Document labeling often involves words being present or absent in the content of each document. Nucleus provides a simple API to make this task less burdensome by automating the labeling of documents based on the presence of selected words within the content. Users only need to establish a list of relevant labels and let the tool do the rest.
Contrast analysis allows you to analyze and understand what information best characterizes one group of documents against another group. This could be applied to today’s news against articles from the past couple of days to identify new content in ongoing conversations. It could also be leveraged on a group of sell-side analysts covering the same stocks or industry sectors to tease out differentiated insights from their respective reports. This could dramatically increase the efficiency of research intelligence by allowing R&D teams to rapidly distill publications. Determine the content that stands out within a group of documents. Nucleus highlights valuable differences otherwise lost in the repetition of ideas.
Just because an article was published today doesn’t make its content new to readers. Recent publications often rehash information that users closely following the field already know. You might have noticed this lack of new content in your own industry. Nucleus determines what degree of truly novel information is contained in any given document compared to the documents previously seen. Documents become noteworthy above a given threshold and earn your attention.
Who should you speak with to get up to speed on a subject that came up during your new project? What authors have the most substantial contributions to topics you care about? Key contributors analysis brings you answers in a flexible and legible manner, surfacing notable authors alongside their most representative write-ups on subjects of your choosing.
Transfer learning allows you to analyze and understand how a topic extracted from one dataset (or determined externally) is represented and perceived in another dataset. This is especially powerful in PR, marketing strategy, and political strategy to understand product and persona traction in different media. This capability is available through the Nucleus APIs.
Authors network analysis determines which authors are most similar to one another based on the topics they contribute to and how they each write about those topics. This is especially useful in content monitoring to identify accounts recently created by offenders who were previously banned or to group active accounts that might actually be closely related even without apparent connections. This capability is available through the Nucleus APIs.
Content never stops and neither should your intelligence gathering. That’s where streaming analytics comes in to keep you updated. It has been implemented in the background of the Nucleus Web App dashboards and can easily be implemented as a wrapper to any SumUp API within your custom workflows.
Over 200 English-content RSS from news media curated by the SumUp team in collaboration with early adopters. These feeds are organized into the following categories: General News, Culture, Healthcare, Finance, Economics, Crypto & Blockchain, AI. Analyze the latest reports on the world’s most consequential subjects.
Nucleus users can build custom datasets by specifying a category and a time period.
This feed starts from the fourth quarter of 2018, except the healthcare category which starts in Q4 of 2019, and updates on a daily basis. Adding extra sources to this feed is fairly simple, please reach out with your needs. Available as read-only and accessible through all Core Analytics and Advanced Analytics capabilities.
Content from 28 Central Banks curated by the SumUp team in collaboration with early adopters. For each bank, the native language content and English content is made available. The content is organized into four document-type categories: Speeches, Press Releases, Formal Research, and Informal Publications.
As a Nucleus user, you can build custom datasets off this feed by specifying a set of central banks, a set of document categories, a language, and a time period.
This feed goes back to January 2000 or the earliest available date from each Bank and currently updates on a daily basis. Available as read-only and accessible through all Core Analytics and Advanced Analytics capabilities.
Content from all companies listed on US exchanges based on the filings available in the EDGAR database. This includes US-headquartered companies and foreign companies with a US listing. The filings collected are the: 10K, 10K/A, 10Q, 10Q/A, 8K, 8K/A, S1, S1/A, 20F, 20F/A, 6K, 6K/A, and the content of each filing is split in standardized sections.
Nucleus users can build custom datasets off this feed by specifying a set of stock tickers, a set of filings, a set of sections within those filings, and a time period.
This feed goes back to January 2000 or the earliest available date from each Company and currently updates on a daily basis. Available as read-only and accessible through all Core Analytics and Advanced Analytics capabilities.